repeating crashes with 8.1

Sat Oct 23 01:59:27 UTC 2010

At 09:11 PM 10/22/2010, Mike Tancsa wrote:
>At 08:01 PM 10/22/2010, Chris Morrow wrote:
>>Note, Warren and I attempted to test this this evening on a 10.04 Ubuntu
>>box, no crashy-crashy...
>

I was able to trigger the issue on box (c).  I was ping6ing box (a) 
when I did a hard down of (d)'s connected interface. The box then 
dropped to debugger

Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer     = 0x20:0xffffffff80740a50
stack pointer           = 0x28:0xffffff800005a890
frame pointer           = 0x28:0xffffff800005a930
code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (swi4: clock)
[thread pid 12 tid 100007 ]
Stopped at      in6_cksum+0x410:        movzwl  (%rsi),%r10d
db> bt
Tracing pid 12 tid 100007 td 0xffffff00025083e0
in6_cksum() at in6_cksum+0x410
icmp6_reflect() at icmp6_reflect+0x312
icmp6_error() at icmp6_error+0x1ec
nd6_llinfo_timer() at nd6_llinfo_timer+0x208
softclock() at softclock+0x2a6
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop() at ithread_loop+0xb2
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff800005ad30, rbp = 0 ---
db>

>I was able to do it, but not the box I expected
>
>4 boxes
>
>(a) Attacking host 2001:db8:1:1/64
>(b) victim, not on a connected interface with a). Outside interface 
>- em0 - 2001:db8::2:1/64, inside interface - em1 - 2001:db8::3:1/64
>(c) a host behind (b) 2001:db8::3:c/64
>(d) a host behind (b), 2001:db8::3:d/64
>
>
>hosts (c) and (d) have default gateways to b).  (c) however, has a 
>next hop for (a) via (d).  So rather than go out its normal default 
>gateway, it takes an extra hop via (d).
>
>Start a ping6 from (a) to (c).  Then down (d)'s interface so that 
>the ping6 fails.  Let the ping keep running for an hour or 
>two.  Eventually (b) gets error messages like
>
>Oct 22 18:38:32 zoo kernel: em1: discard frame w/o packet header
>
>and crashes.
>
>Unfortunately, I thought it would be (c) that crapped out, not (b) 
>and I didnt have crash dumps enabled on the host.  Just in the 
>process of setting up a better environment.
>
>         ---Mike
>
>>-chris
>>
>>On 10/22/10 16:27, Joel Jaeggli wrote:
>> > Ok I'll try testing that on some box I can reach with both hands.
>> >
>> > fyi nagasaki is:
>> >
>> > [root at nagasaki ~]# uname -a
>> > FreeBSD nagasaki.bogus.com 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #13:
>> > Sun May 30 22:19:23 UTC 2010
>> > root at nagasaki.bogus.com:/usr/obj/usr/src/sys/GENERIC  i386
>> > [root at nagasaki ~]#
>> >
>> >
>> > On 10/22/10 1:17 PM, Randy Bush wrote:
>> >>>>>>> Do you know how this panic is triggered ? Are you able to
>> >>>>>>> create it on demand ?
>> >>>>>>
>> >>>>>> no i do not.  bring server up and it'll happen in half an hour.
>> >>>>>> and the server was happy for two months.  so i am thinking hardware.
>> >>>>>
>> >>>>> Perhaps. The reason I ask is that I had a box go down last night with
>> >>>>> the same set of errors.  The box has a number of ipv6 routes, but its
>> >>>>> next hop was down and the problems started soon after. So I wonder if
>> >>>>> it has something to do with that.  Do you have ipv6 on this box and
>> >>>>> are all the next hop addresses correct / reachable ?
>> >>>>>
>> >>>>> Oct 22 02:06:02 i4 kernel: em1: discard frame w/o packet header
>> >>>>> Oct 22 02:06:10 i4 kernel: em2: discard frame w/o packet header
>> >>>>> Oct 22 02:06:21 i4 kernel: em1: discard frame w/o packet header
>> >>>>
>> >>>> it was co-incident with a border router being taken down for new router
>> >>>> install.  that router was the v6 exit the servers was 
>> using.  i have now
>> >>>> pointed default6 to a different exit.  the server seems happy.
>> >>>
>> >>>
>> >>> Are you servers still up ?  I guess the question now is how to
>> >>> trigger this problem on demand.  Perhaps lots of inbound ipv6 traffic
>> >>> with a bad next hop out ?  How recent are you sources ?  The kernel
>> >>> said Oct 21st. Were the sources from then too ?
>> >>
>> >> yes, kernel and world from 21 oct
>> >>
>> >> chris had an idea on retrigger, install a static for a small dest that
>> >> points to a hole.  send a packet to the small dest.
>> >>
>> >> randy
>> >>
>
>--------------------------------------------------------------------
>Mike Tancsa,                                      tel +1 519 651 3400
>Sentex Communications,                            mike at sentex.net
>Providing Internet since 1994                    www.sentex.net
>Cambridge, Ontario Canada                         www.sentex.net/mike

--------------------------------------------------------------------
Mike Tancsa,                                      tel +1 519 651 3400
Sentex Communications,                            mike at sentex.net
Providing Internet since 1994                    www.sentex.net
Cambridge, Ontario Canada                         www.sentex.net/mike