Kernel panics in tcp_twclose
Palle Girgensohn
girgen at FreeBSD.org
Tue Sep 22 22:01:50 UTC 2015
> 22 sep 2015 kl. 23:59 skrev Julien Charbon <jch at freebsd.org>:
>
>
> Hi Palle,
>
> On 22/09/15 22:58, Palle Girgensohn wrote:
>>> 22 sep 2015 kl. 20:16 skrev Julien Charbon <jch at freebsd.org>:
>>> On 22/09/15 18:49, Palle Girgensohn wrote:
>>>>> 22 sep 2015 kl. 18:46 skrev Palle Girgensohn
>>>>> <girgen at FreeBSD.org>:
>>>>>> 21 sep 2015 kl. 15:53 skrev Palle Girgensohn
>>>>>> <girgen at FreeBSD.org>:
>>>>>>> 21 sep 2015 kl. 10:21 skrev Julien Charbon
>>>>>>> <jch at FreeBSD.org>: On 18/09/15 18:06, Konstantin Belousov
>>>>>>> wrote:
>>>>>>>> On Fri, Sep 18, 2015 at 03:56:25PM +0200, Julien Charbon
>>>>>>>> wrote:
>>>>>>>>> [...]
>>>>>>> - Second, if issue is still in stable/10, compile 10.2
>>>>>>> kernel with these options:
>>>>>>>
>>>>>>> options DDB options DEADLKRES options
>>>>>>> INVARIANTS options INVARIANT_SUPPORT options WITNESS
>>>>>>> options WITNESS_SKIPSPIN
>>>>>>>
>>>>>>> To see where the original fault is coming from.
>>>>>> [...]
>>>>>>
>>>>>> I'll try stable/10 now. Would you suggest a "clean"
>>>>>> stable/10, or could 287621 and 287780 help?
>>>>>>
>>>>>> I'll add the debugging suggested options right away.
>>>>>>
>>>>>> Palle
>>>>>
>>>>> I have a new core dump from ^/stable/10 with:
>>>>>
>>>>> options DDB options DEADLKRES options INVARIANTS
>>>>> options INVARIANT_SUPPORT options WITNESS options
>>>>> WITNESS_SKIPSPIN
>>>>
>>>> # kgdb kernel /var/crash/vmcore.2 GNU gdb 6.1.1 [FreeBSD]
>>>> Copyright 2004 Free Software Foundation, Inc. GDB is free
>>>> software, covered by the GNU General Public License, and you are
>>>> welcome to change it and/or distribute copies of it under certain
>>>> conditions. Type "show copying" to see the conditions. There is
>>>> absolutely no warranty for GDB. Type "show warranty" for
>>>> details. This GDB was configured as "amd64-marcel-freebsd"...
>>>>
>>>> Unread portion of the kernel message buffer: panic: tcp_detach:
>>>> INP_TIMEWAIT && INP_DROPPED && tp != NULL cpuid = 16 KDB: stack
>>>> backtrace: db_trace_self_wrapper() at
>>>> db_trace_self_wrapper+0x2b/frame 0xfffffe183d9e97e0
>>>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe183d9e9890
>>>> vpanic() at vpanic+0x126/frame 0xfffffe183d9e98d0 kassert_panic()
>>>> at kassert_panic+0x139/frame 0xfffffe183d9e9940 tcp_usr_detach()
>>>> at tcp_usr_detach+0xf9/frame 0xfffffe183d9e9970 sofree() at
>>>> sofree+0x1f1/frame 0xfffffe183d9e99a0 soclose() at
>>>> soclose+0x3a0/frame 0xfffffe183d9e99f0 _fdrop() at
>>>> _fdrop+0x29/frame 0xfffffe183d9e9a10 closef() at
>>>> closef+0x1e2/frame 0xfffffe183d9e9aa0 closefp() at
>>>> closefp+0x9d/frame 0xfffffe183d9e9ae0 amd64_syscall() at
>>>> amd64_syscall+0x25a/frame 0xfffffe183d9e9bf0 Xfast_syscall() at
>>>> Xfast_syscall+0xfb/frame 0xfffffe183d9e9bf0 --- syscall (6,
>>>> FreeBSD ELF64, sys_close), rip = 0x801c8d94a, rsp =
>>>> 0x7ffff91c8668, rbp = 0x7ffff91c8680 --- KDB: enter: panic
>>>> Uptime: 18h57m59s Dumping 23085 out of 98263
>>>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
>>>>
>>>> Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
>>>> Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading symbols
>>>> from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for
>>>> /boot/kernel/zfs.ko.symbols Reading symbols from
>>>> /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for
>>>> /boot/kernel/opensolaris.ko.symbols Reading symbols from
>>>> /boot/kernel/ng_bridge.ko.symbols...done. Loaded symbols for
>>>> /boot/kernel/ng_bridge.ko.symbols Reading symbols from
>>>> /boot/kernel/netgraph.ko.symbols...done. Loaded symbols for
>>>> /boot/kernel/netgraph.ko.symbols Reading symbols from
>>>> /boot/kernel/ng_eiface.ko.symbols...done. Loaded symbols for
>>>> /boot/kernel/ng_eiface.ko.symbols Reading symbols from
>>>> /boot/kernel/ng_ether.ko.symbols...done. Loaded symbols for
>>>> /boot/kernel/ng_ether.ko.symbols Reading symbols from
>>>> /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for
>>>> /boot/kernel/accf_data.ko.symbols Reading symbols from
>>>> /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for
>>>> /boot/kernel/accf_http.ko.symbols Reading symbols from
>>>> /boot/kernel/ums.ko.symbols...done. Loaded symbols for
>>>> /boot/kernel/ums.ko.symbols Reading symbols from
>>>> /boot/kernel/ng_socket.ko.symbols...done. Loaded symbols for
>>>> /boot/kernel/ng_socket.ko.symbols Reading symbols from
>>>> /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for
>>>> /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=1) at
>>>> pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) bt #0
>>>> doadump (textdump=1) at pcpu.h:219 #1 0xffffffff8094b337 in
>>>> kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451
>>>> #2 0xffffffff8094b845 in vpanic (fmt=<value optimized out>,
>>>> ap=<value optimized out>) at
>>>> /usr/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff8094b6d9 in
>>>> kassert_panic (fmt=<value optimized out>) at
>>>> /usr/src/sys/kern/kern_shutdown.c:646 #4 0xffffffff80b1ee59 in
>>>> tcp_usr_detach (so=<value optimized out>) at
>>>> /usr/src/sys/netinet/tcp_usrreq.c:202 #5 0xffffffff809cd291 in
>>>> sofree (so=0xfffff801dd302000) at
>>>> /usr/src/sys/kern/uipc_socket.c:747 #6 0xffffffff809cdb00 in
>>>> soclose (so=<value optimized out>) at
>>>> /usr/src/sys/kern/uipc_socket.c:849 #7 0xffffffff808fe659 in
>>>> _fdrop (fp=0xfffff802a593db40, td=0x0) at file.h:343 #8
>>>> 0xffffffff80901092 in closef (fp=0xfffff802a593db40,
>>>> td=0xfffff80eebc894a0) at /usr/src/sys/kern/kern_descrip.c:2338
>>>> #9 0xffffffff808feb5d in closefp (fdp=0xfffff80b20cce000,
>>>> fd=<value optimized out>, fp=0xfffff802a593db40,
>>>> td=0xfffff80eebc894a0, holdleaders=<value optimized out>) at
>>>> /usr/src/sys/kern/kern_descrip.c:1194 #10 0xffffffff80d7bc3a in
>>>> amd64_syscall (td=0xfffff80eebc894a0, traced=0) at
>>>> subr_syscall.c:134 #11 0xffffffff80d5f1db in Xfast_syscall () at
>>>> /usr/src/sys/amd64/amd64/exception.S:396 #12 0x0000000801c8d94a
>>>> in ?? () Previous frame inner to this frame (corrupt stack?)
>>>> Current language: auto; currently minimal
>>>
>>> Thanks for the information. As I suspected the initial error was
>>> elsewhere than tcp_twclose(), never got this assertion before:
>>>
>>> tcp_detach: INP_TIMEWAIT && INP_DROPPED && tp != NULL
>>>
>>> from here:
>>>
>>> static void tcp_detach(struct socket *so, struct inpcb *inp) {
>>> struct tcpcb *tp;
>>>
>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); INP_WLOCK_ASSERT(inp);
>>>
>>> KASSERT(so->so_pcb == inp, ("tcp_detach: so_pcb != inp"));
>>> KASSERT(inp->inp_socket == so, ("tcp_detach: inp_socket != so"));
>>>
>>> tp = intotcpcb(inp);
>>>
>>> if (inp->inp_flags & INP_TIMEWAIT) { if (inp->inp_flags &
>>> INP_DROPPED) { KASSERT(tp == NULL, ("tcp_detach: INP_TIMEWAIT && "
>>> "INP_DROPPED && tp != NULL"));
>>>
>>> Let me check if I could find a path that could lead to this
>>> unexpected case. Unexpected because: INP_DROPPED and
>>> inp->inp_ppcb is set to NULL are set at same time here:
>>>
>>> void tcp_twclose(struct tcptw *tw, int reuse) { struct socket *so;
>>> struct inpcb *inp;
>>>
>>> inp = tw->tw_inpcb; KASSERT((inp->inp_flags & INP_TIMEWAIT),
>>> ("tcp_twclose: !timewait")); KASSERT(intotw(inp) == tw,
>>> ("tcp_twclose: inp_ppcb != tw"));
>>> INP_INFO_WLOCK_ASSERT(&V_tcbinfo); /* in_pcbfree() */
>>> INP_WLOCK_ASSERT(inp);
>>>
>>> tcp_tw_2msl_stop(tw, reuse); inp->inp_ppcb = NULL;
>>> in_pcbdrop(inp); ...
>>>
>>> Interesting and by the way could you try:
>>>
>>> # kgdb kernel /var/crash/vmcore.2 (kgdb) info threads
>>>
>>> To see if other thread are also in TCP stack at the same time, and
>>> if one of this thread is referencing the same inp.
>>>
>>
>> BTW, this backtrace looks quite different from the previous ones? Is
>> it a different problem, or just a different way to reveal the same
>> problem?
>
> Having a different backstrace is expected, the first backstrace was
> quite deep in the stack, and kernel panic-ed quite late after the
> original issue. With the kernel debug options, the kernel stopped at
> the first suspicious fact.
>
> And I would say that I still does not understand how it is possible to
> reach that state, it is like kernel exclusive lock stopped to work and
> allow several thread to work on the same locked inp at same time.
> (Stack overflow, VIMAGE memory corruption, UMA issue? ... weird).
>
> I will ask you the kernel and the core dump but off list to avoid too
> much spam in -net.
>
> --
> Julien
>
Great, I'll send it you promptly.
Palle
More information about the freebsd-net
mailing list