FreeBSD 7.3, reboot after panic: double fault
Bjoern A. Zeeb
bzeeb-lists at lists.zabbadoz.net
Wed Apr 21 07:55:09 UTC 2010
On Tue, 20 Apr 2010, pluknet wrote:
> On 20 April 2010 15:48, John Baldwin <jhb at freebsd.org> wrote:
>> On Tuesday 20 April 2010 2:53:16 am c0re wrote:
>>> Hello All!
>>> I've upgraded freebsd from 7.0 to 7.3 and all was good until I tryed to
>>> configure gre interface and use ipfw fwd.
>>> I'm actually does not know what was the point of failure in my
>>> configuration.
>>>
>>> [ some details snipped ]
>>>
>>> It worked about one week and then I made some configuration changes:
>>> added gre interface and 2 aliases:
>>>
>>> # cat /etc/rc.conf |grep
>>> ifconfig_xl0="inet 192.168.0.10 netmask 255.255.255.0"
>>> ifconfig_xl0_alias0="192.168.0.11 netmask 255.255.255.255"
>>> ifconfig_xl0_alias1="192.168.0.12 netmask 255.255.255.255"
>>> cloned_interfaces="gre0"
>>> ifconfig_gre0="inet 192.168.250.6 192.168.250.5 tunnel 192.168.0.12
>>> 192.168.200.15 netmask 255.255.255.252 link1 up"
>>>
>>> and
>>>
>>> # cat /etc/rc.local
>>> #!/bin/sh
>>> ipfw add fwd 192.168.250.5 icmp from 192.168.0.11 to any out via xl0
>>> ipfw add fwd 192.168.250.5 tcp from 192.168.0.11 443 to any out via xl0
>>> ipfw add allow ip from any to any
>>>
>>> # ifconfig gre0
>>> gre0: flags=b050<POINTOPOINT,RUNNING,LINK0,LINK1,MULTICAST> metric 0 mtu
>>> 1476
>>> tunnel inet 192.168.0.12 --> 192.168.200.15
>>> inet 192.168.250.6 --> 192.168.250.5 netmask 0xfffffffc
>>>
>>> I shutted down gre interface to prevent requests via gre to buggy IP.
>>>
>>> The main idea of such configurations was: fwd all connections to https to
>>> 192.168.0.1 via gre interface.
>>> And also I made apache configurations to make it listen on 192.168.0.11 too.
>>>
>>> And make some tests: ping 192.168.0.11 - was fine, goes via gre. Telnet to
>>> 192.168.0.11 443 was fine too. Then I tryed to make browser https
>>> connection to 192.168.0.11. Apache showed me certificate warning and I
>>> accepted, then in browser nothing happened, it was trying to open page. But
>>> server got kernel panic at that moment.
>>>
>>> At first time I thought that it was some power failure, I tryed 2 more times
>>> and got same behaviour.
>>>
>>> So https works without kernel panic via 192.168.0.10 address but kernel
>>> panics when I try do https via 192.168.0.11 address that source-forwarded
>>> via gre.
>>
>> Looks like the TCP output path got stuck in an infinite recursion loop until
>> it exhausted the kernel stack:
>>
>>> # cd /usr/obj/usr/src/sys/MYKERNEL
>>> # kgdb kernel.debug /var/crash/vmcore.2
>>> GNU gdb 6.1.1 [FreeBSD]
>>> Copyright 2004 Free Software Foundation, Inc.
>>> GDB is free software, covered by the GNU General Public License, and you are
>>> welcome to change it and/or distribute copies of it under certain
>>> conditions.
>>> Type "show copying" to see the conditions.
>>> There is absolutely no warranty for GDB. Type "show warranty" for details.
>>> This GDB was configured as "i386-marcel-freebsd"...
>>>
>>> Unread portion of the kernel message buffer:
>>>
>>> Fatal double fault:
>>> eip = 0xc08e3ba3
>>> esp = 0xccf6dfc4
>>> ebp = 0xccf6e274
>>> cpuid = 0; apic id = 00
>>> panic: double fault
>>> cpuid = 0
>>> Uptime: 7m14s
>>> Physical memory: 235 MB
>>> Dumping 35 MB: 20 4
>>>
>>> Reading symbols from /boot/kernel/acpi.ko...Reading symbols from
>>> /boot/kernel/acpi.ko.symbols...done.
>>> done.
>>> Loaded symbols for /boot/kernel/acpi.ko
>>> Reading symbols from /boot/kernel/if_gre.ko...Reading symbols from
>>> /boot/kernel/if_gre.ko.symbols...done.
>>> done.
>>> Loaded symbols for /boot/kernel/if_gre.ko
>>> Reading symbols from /boot/kernel/linux.ko...Reading symbols from
>>> /boot/kernel/linux.ko.symbols...done.
>>> done.
>>> Loaded symbols for /boot/kernel/linux.ko
>>> #0 doadump () at pcpu.h:196
>>> 196 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
>>> (kgdb) bt
>>> #0 doadump () at pcpu.h:196
>>> #1 0xc07f2857 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
>>> #2 0xc07f2b29 in panic (fmt=Variable "fmt" is not available.
>>> ) at /usr/src/sys/kern/kern_shutdown.c:574
>>> #3 0xc0a7ea2b in dblfault_handler () at /usr/src/sys/i386/i386/trap.c:983
>>> #4 0xc08e3ba3 in ipfw_chk (args=0xccf6e28c) at
>>> /usr/src/sys/netinet/ip_fw2.c:2465
>>> #5 0xc08e6ce1 in ipfw_check_out (arg=0x0, m0=0xccf6e390, ifp=0xc25c5c00,
>>> dir=2, inp=0xc28ba708) at /usr/src/sys/netinet/ip_fw_pfil.c:248
>>> #6 0xc08a1968 in pfil_run_hooks (ph=0xc0c55240, mp=0xccf6e420,
>>> ifp=0xc25c5c00, dir=2, inp=0xc28ba708) at /usr/src/sys/net/pfil.c:78
>>> #7 0xc08eb6f2 in ip_output (m=0xc2710b00, opt=0x0, ro=0xccf6e3f4, flags=0,
>>> imo=0x0, inp=0xc28ba708) at /usr/src/sys/netinet/ip_output.c:443
>>> #8 0xc08f4016 in tcp_output (tp=0xc25b2570) at
>>> /usr/src/sys/netinet/tcp_output.c:1134
[twiddle]
>>> #47 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>>> #48 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>>> /usr/src/sys/netinet/tcp_output.c:1195
>>> #49 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>>> ---Type <return> to continue, or q <return> to quit---
>>> #50 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>>> /usr/src/sys/netinet/tcp_output.c:1195
>>> #51 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>>> #52 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>>> /usr/src/sys/netinet/tcp_output.c:1195
>>> #53 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>>> #54 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>>> /usr/src/sys/netinet/tcp_output.c:1195
>>> #55 0xc08fdcf8 in tcp_usr_send (so=0xc2ac1820, flags=0, m=0xc270ed00,
>>> nam=0x0, control=0x0, td=0xc28e2d80) at tcp_offload.h:269
>>> #56 0xc0850405 in sosend_generic (so=0xc2ac1820, addr=0x0, uio=0xc28766c0,
>>> top=0xc270ed00, control=0x0, flags=0, td=0xc28e2d80) at
>>> /usr/src/sys/kern/uipc_socket.c:1243
>>> #57 0xc084bf7f in sosend (so=0xc2ac1820, addr=0x0, uio=0xc28766c0, top=0x0,
>>> control=0x0, flags=0, td=0xc28e2d80) at /usr/src/sys/kern/uipc_socket.c:1285
>>> #58 0xc0833c5b in soo_write (fp=0xc28e84c0, uio=0xc28766c0,
>>> active_cred=0xc28e5900, flags=0, td=0xc28e2d80) at
>>> /usr/src/sys/kern/sys_socket.c:103
>>> #59 0xc082d2e7 in dofilewrite (td=0xc28e2d80, fd=24, fp=0xc28e84c0,
>>> auio=0xc28766c0, offset=-1, flags=0) at file.h:257
>>> #60 0xc082d5c8 in kern_writev (td=0xc28e2d80, fd=24, auio=0xc28766c0) at
>>> /usr/src/sys/kern/sys_generic.c:402
>>> #61 0xc082d816 in writev (td=0xc28e2d80, uap=0xccf6fcfc) at
>>> /usr/src/sys/kern/sys_generic.c:388
>>> #62 0xc0a7f2d5 in syscall (frame=0xccf6fd38) at
>>> /usr/src/sys/i386/i386/trap.c:1101
>>> #63 0xc0a636a0 in Xint0x80_syscall () at
>>> /usr/src/sys/i386/i386/exception.s:262
>>> #64 0x00000033 in ?? ()
>>> Previous frame inner to this frame (corrupt stack?)
>>> (kgdb)
>>> (kgdb) quit
>>
>> tcp_output() calls tcp_mtudisc() if ip_output() returns EMSGSIZE:
>>
>> case EMSGSIZE:
>> /*
>> * For some reason the interface we used initially
>> * to send segments changed to another or lowered
>> * its MTU.
>> *
>> * tcp_mtudisc() will find out the new MTU and as
>> * its last action, initiate retransmission, so it
>> * is important to not do so here.
>> *
>> * If TSO was active we either got an interface
>> * without TSO capabilits or TSO was turned off.
>> * Disable it for this connection as too and
>> * immediatly retry with MSS sized segments generated
>> * by this function.
>> */
>> if (tso)
>> tp->t_flags &= ~TF_TSO;
>> tcp_mtudisc(tp->t_inpcb, 0);
>> return (0);
>>
>> But tcp_mtudisc() calls tcp_output():
>>
>> tcpstat.tcps_mturesent++;
>> tp->t_rtttime = 0;
>> tp->snd_nxt = tp->snd_una;
>> tcp_free_sackholes(tp);
>> tp->snd_recover = tp->snd_max;
>> if (tp->t_flags & TF_SACK_PERMIT)
>> EXIT_FASTRECOVERY(tp);
>> tcp_output_send(tp);
>> return (inp);
>>
>> I'm not sure why it's not able to figure out the MTU, perhaps folks on net@
>> can help. However, it would seem that for the tcp_output() case,
>> tcp_mtudisc() should probably not call tcp_output_send(), but instead
>> tcp_output() should just loop back up to the top after calling tcp_mtudisc()
>> and retry.
>>
>
> I'm afraid to be wrong but it looks similar to another report for 8.0-STABLE
> (may it be a cross-major version regression somewhere around tcp_mtudisc()?):
>
> http://lists.freebsd.org/pipermail/freebsd-stable/2010-April/056063.html
The backtrace indeed looks very similar. The reporter at that point
had been cvs updated in the middle between two commits. Updating
again seemed to have fixed it for him.
None of those changes are in 7.3-RELEASE though and the endless loop
indeed should no happen.
Is the kernel and the core file still avail for further analyses?
/bz
--
Bjoern A. Zeeb It will not break if you know what you are doing.
More information about the freebsd-net
mailing list