FreeBSD 7.3, reboot after panic: double fault

pluknet pluknet at gmail.com
Tue Apr 20 14:17:37 UTC 2010


On 20 April 2010 15:48, John Baldwin <jhb at freebsd.org> wrote:
> On Tuesday 20 April 2010 2:53:16 am c0re wrote:
>> Hello All!
>> I've upgraded freebsd from 7.0 to 7.3 and all was good until I tryed to
>> configure gre interface and use ipfw fwd.
>> I'm actually does not know what was the point of failure in my
>> configuration.
>>
>> [ some details snipped ]
>>
>> It worked about one week and then I made some configuration changes:
>> added gre interface and 2 aliases:
>>
>> # cat /etc/rc.conf |grep
>> ifconfig_xl0="inet 192.168.0.10  netmask 255.255.255.0"
>> ifconfig_xl0_alias0="192.168.0.11 netmask 255.255.255.255"
>> ifconfig_xl0_alias1="192.168.0.12 netmask 255.255.255.255"
>> cloned_interfaces="gre0"
>> ifconfig_gre0="inet 192.168.250.6 192.168.250.5 tunnel 192.168.0.12
>> 192.168.200.15 netmask 255.255.255.252 link1 up"
>>
>> and
>>
>> # cat /etc/rc.local
>> #!/bin/sh
>> ipfw add fwd 192.168.250.5 icmp from 192.168.0.11 to any out via xl0
>> ipfw add fwd 192.168.250.5 tcp from 192.168.0.11 443 to any out via xl0
>> ipfw add allow ip from any to any
>>
>> # ifconfig gre0
>> gre0: flags=b050<POINTOPOINT,RUNNING,LINK0,LINK1,MULTICAST> metric 0 mtu
>> 1476
>>         tunnel inet 192.168.0.12 --> 192.168.200.15
>>         inet 192.168.250.6 --> 192.168.250.5 netmask 0xfffffffc
>>
>> I shutted down gre interface to prevent requests via gre to buggy IP.
>>
>> The main idea of such configurations was: fwd all connections to https to
>> 192.168.0.1 via gre interface.
>> And also I made apache configurations to make it listen on 192.168.0.11 too.
>>
>> And make some tests: ping 192.168.0.11 - was fine, goes via gre. Telnet to
>> 192.168.0.11  443 was fine too. Then I tryed to make browser https
>> connection to 192.168.0.11. Apache showed me certificate warning and I
>> accepted, then in browser nothing happened, it was trying to open page. But
>> server got kernel panic at that moment.
>>
>> At first time I thought that it was some power failure, I tryed 2 more times
>> and got same behaviour.
>>
>> So https works without kernel panic via 192.168.0.10 address but kernel
>> panics when I try do https via 192.168.0.11 address that source-forwarded
>> via gre.
>
> Looks like the TCP output path got stuck in an infinite recursion loop until
> it exhausted the kernel stack:
>
>> # cd /usr/obj/usr/src/sys/MYKERNEL
>> # kgdb kernel.debug /var/crash/vmcore.2
>> GNU gdb 6.1.1 [FreeBSD]
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and you are
>> welcome to change it and/or distribute copies of it under certain
>> conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for details.
>> This GDB was configured as "i386-marcel-freebsd"...
>>
>> Unread portion of the kernel message buffer:
>>
>> Fatal double fault:
>> eip = 0xc08e3ba3
>> esp = 0xccf6dfc4
>> ebp = 0xccf6e274
>> cpuid = 0; apic id = 00
>> panic: double fault
>> cpuid = 0
>> Uptime: 7m14s
>> Physical memory: 235 MB
>> Dumping 35 MB: 20 4
>>
>> Reading symbols from /boot/kernel/acpi.ko...Reading symbols from
>> /boot/kernel/acpi.ko.symbols...done.
>> done.
>> Loaded symbols for /boot/kernel/acpi.ko
>> Reading symbols from /boot/kernel/if_gre.ko...Reading symbols from
>> /boot/kernel/if_gre.ko.symbols...done.
>> done.
>> Loaded symbols for /boot/kernel/if_gre.ko
>> Reading symbols from /boot/kernel/linux.ko...Reading symbols from
>> /boot/kernel/linux.ko.symbols...done.
>> done.
>> Loaded symbols for /boot/kernel/linux.ko
>> #0  doadump () at pcpu.h:196
>> 196             __asm __volatile("movl %%fs:0,%0" : "=r" (td));
>> (kgdb) bt
>> #0  doadump () at pcpu.h:196
>> #1  0xc07f2857 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
>> #2  0xc07f2b29 in panic (fmt=Variable "fmt" is not available.
>> ) at /usr/src/sys/kern/kern_shutdown.c:574
>> #3  0xc0a7ea2b in dblfault_handler () at /usr/src/sys/i386/i386/trap.c:983
>> #4  0xc08e3ba3 in ipfw_chk (args=0xccf6e28c) at
>> /usr/src/sys/netinet/ip_fw2.c:2465
>> #5  0xc08e6ce1 in ipfw_check_out (arg=0x0, m0=0xccf6e390, ifp=0xc25c5c00,
>> dir=2, inp=0xc28ba708) at /usr/src/sys/netinet/ip_fw_pfil.c:248
>> #6  0xc08a1968 in pfil_run_hooks (ph=0xc0c55240, mp=0xccf6e420,
>> ifp=0xc25c5c00, dir=2, inp=0xc28ba708) at /usr/src/sys/net/pfil.c:78
>> #7  0xc08eb6f2 in ip_output (m=0xc2710b00, opt=0x0, ro=0xccf6e3f4, flags=0,
>> imo=0x0, inp=0xc28ba708) at /usr/src/sys/netinet/ip_output.c:443
>> #8  0xc08f4016 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1134
>> #9  0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #10 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #11 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #12 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #13 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #14 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #15 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #16 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #17 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #18 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #19 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #20 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #21 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #22 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #23 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #24 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #25 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #26 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #27 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #28 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #29 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #30 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #31 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #32 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #33 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #34 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #35 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #36 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #37 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #38 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #39 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #40 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #41 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #42 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #43 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #44 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #45 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #46 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #47 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #48 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #49 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> ---Type <return> to continue, or q <return> to quit---
>> #50 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #51 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #52 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #53 0xc08f6d98 in tcp_mtudisc (inp=0xc28ba708, errno=0) at tcp_offload.h:269
>> #54 0xc08f4105 in tcp_output (tp=0xc25b2570) at
>> /usr/src/sys/netinet/tcp_output.c:1195
>> #55 0xc08fdcf8 in tcp_usr_send (so=0xc2ac1820, flags=0, m=0xc270ed00,
>> nam=0x0, control=0x0, td=0xc28e2d80) at tcp_offload.h:269
>> #56 0xc0850405 in sosend_generic (so=0xc2ac1820, addr=0x0, uio=0xc28766c0,
>> top=0xc270ed00, control=0x0, flags=0, td=0xc28e2d80) at
>> /usr/src/sys/kern/uipc_socket.c:1243
>> #57 0xc084bf7f in sosend (so=0xc2ac1820, addr=0x0, uio=0xc28766c0, top=0x0,
>> control=0x0, flags=0, td=0xc28e2d80) at /usr/src/sys/kern/uipc_socket.c:1285
>> #58 0xc0833c5b in soo_write (fp=0xc28e84c0, uio=0xc28766c0,
>> active_cred=0xc28e5900, flags=0, td=0xc28e2d80) at
>> /usr/src/sys/kern/sys_socket.c:103
>> #59 0xc082d2e7 in dofilewrite (td=0xc28e2d80, fd=24, fp=0xc28e84c0,
>> auio=0xc28766c0, offset=-1, flags=0) at file.h:257
>> #60 0xc082d5c8 in kern_writev (td=0xc28e2d80, fd=24, auio=0xc28766c0) at
>> /usr/src/sys/kern/sys_generic.c:402
>> #61 0xc082d816 in writev (td=0xc28e2d80, uap=0xccf6fcfc) at
>> /usr/src/sys/kern/sys_generic.c:388
>> #62 0xc0a7f2d5 in syscall (frame=0xccf6fd38) at
>> /usr/src/sys/i386/i386/trap.c:1101
>> #63 0xc0a636a0 in Xint0x80_syscall () at
>> /usr/src/sys/i386/i386/exception.s:262
>> #64 0x00000033 in ?? ()
>> Previous frame inner to this frame (corrupt stack?)
>> (kgdb)
>> (kgdb) quit
>
> tcp_output() calls tcp_mtudisc() if ip_output() returns EMSGSIZE:
>
>                case EMSGSIZE:
>                        /*
>                         * For some reason the interface we used initially
>                         * to send segments changed to another or lowered
>                         * its MTU.
>                         *
>                         * tcp_mtudisc() will find out the new MTU and as
>                         * its last action, initiate retransmission, so it
>                         * is important to not do so here.
>                         *
>                         * If TSO was active we either got an interface
>                         * without TSO capabilits or TSO was turned off.
>                         * Disable it for this connection as too and
>                         * immediatly retry with MSS sized segments generated
>                         * by this function.
>                         */
>                        if (tso)
>                                tp->t_flags &= ~TF_TSO;
>                        tcp_mtudisc(tp->t_inpcb, 0);
>                        return (0);
>
> But tcp_mtudisc() calls tcp_output():
>
>        tcpstat.tcps_mturesent++;
>        tp->t_rtttime = 0;
>        tp->snd_nxt = tp->snd_una;
>        tcp_free_sackholes(tp);
>        tp->snd_recover = tp->snd_max;
>        if (tp->t_flags & TF_SACK_PERMIT)
>                EXIT_FASTRECOVERY(tp);
>        tcp_output_send(tp);
>        return (inp);
>
> I'm not sure why it's not able to figure out the MTU, perhaps folks on net@
> can help.  However, it would seem that for the tcp_output() case,
> tcp_mtudisc() should probably not call tcp_output_send(), but instead
> tcp_output() should just loop back up to the top after calling tcp_mtudisc()
> and retry.
>

I'm afraid to be wrong but it looks similar to another report for 8.0-STABLE
(may it be a cross-major version regression somewhere around tcp_mtudisc()?):

http://lists.freebsd.org/pipermail/freebsd-stable/2010-April/056063.html

-- 
wbr,
pluknet


More information about the freebsd-net mailing list