system locks up when pf is enabled? (was: system locks up with vr
driver on alix board)
Ask Bjørn Hansen
ask at develooper.com
Wed Aug 31 03:53:19 UTC 2011
On Aug 16, 2011, at 20:15, Julian Elischer wrote:
> from your description it doesn't sound like a vr problem.
> I suggest you hook up teh serial console (I am guessing you already have)
> and set the config options to allow break-to-debugger or alt-break-to-debugger on it
> when it happens next, drop into the debugger..
>
> in fact, drop in, and do a ps to see what processes are runnable,
> 'tr [pid] (or thread id)' to get a stack trace of anything that looks
> interesting, and then cont and do it again a few times to get a feel
> for where the processor is hanging out (a straight 'tr' will give you
> the interrupt of the com port which is not intresting..)
Alright, I thought I had locked this down to ZERO_COPY_SOCKETS being enabled in the kernel (as unlikely as it seems that it'd break things without being explicitly used -- I wanted to believe!).
A couple days ago I figured out how to consistently make it happen (or not). For about a week we'd been running with no trouble; until I realized that a hostname added to our pf rules made them not load on startup. When running without our pf rules loaded, everything is fine. WIth the pf rules loaded the system will hang within 2-4 hours.
Our ruleset was about 240 rules. We cut it down to ~140 rules today to see if that made a difference; it didn't. We also turned off as much IPv6 traffic as we could in case that was what was disturbing things (adding IPv6 traffic is relatively new).
Flipping the backup system to be the master for the active IP addresses doesn't (as far as I can tell) make the old master recover.
I tried looking for interesting information with the kernel debugger; but honestly nothing springs out as interesting -- though that may be due to my lack of skills more than anything else. I am including a snapshot below. I ran ps and tr on 100024, 100025 (vr0 and vr1 interrupt threads) and 100022 (thread taskq) a few more times but it seemed to me to give basically the same thread trace information.
For what it's worth, if I let 'cont' run for long enough for the system to send out carp packets and get it's IPs back, 'ps' seemed more likely to have 100024 or 100025 (vr0/vr1) in 'Run' state. If I interrupted the system again relatively quickly after continuing, it was more likely to have 'thread tasks' be the busy process.
Any tips for what I can do to extract something more useful would be greatly appreciated.
As a last note -- earlier today I noticed that "Searches" in 'pfctl -si' seemed to go up significantly just before the console locks up (which is 15-30 minutes before the box stops routing packets). My theory was that we were getting some sort of tiny DoS attack (doesn't take much to take out a computer the size of a CD case); but that doesn't really make sense as the system 1) never recovers and 2) the backup is fine as soon as the master has been turned off / changed priorities.
- ask
db> ps
pid ppid pgrp uid state wmesg wchan cmd
3300 2371 2371 0 R sshd
3299 3298 3298 0 S+ sbwait 0xc2ddb58c ssh
3298 3100 3298 0 S+ piperd 0xc2624620 scp
3100 3098 3100 0 Ss+ pause 0xc2e95850 csh
3098 2371 3098 0 Rs sshd
2904 2371 2904 0 Ss select 0xc379a924 sshd
2476 2474 2476 0 Ss+ ttyin 0xc2e44070 csh
2474 2371 2474 0 Rs sshd
2440 1 2440 0 Rs+ getty
2433 1 2433 0 Rs cron
2426 1 2426 62 Rs ftp-proxy
2397 1 2397 0 Ss select 0xc2d71964 inetd
2371 1 2371 0 Rs sshd
2349 1 2349 0 Rs bird
2302 1 2302 0 Rs radvd
2298 2292 2285 65534 S piperd 0xc2624188 multilog
2297 2291 2285 0 R openvpn
2292 2287 2285 0 S select 0xc2d713a4 supervise
2291 2287 2285 0 S select 0xc2d71a64 supervise
2288 1 2285 0 S piperd 0xc2624c40 readproctitle
2287 1 2285 0 R svscan
2244 1 2244 0 Rs ntpd
2129 1 2129 0 Rs syslogd
50 0 0 0 SL mdwait 0xc2717000 [md1]
40 0 0 0 SL mdwait 0xc24ef000 [md0]
22 0 0 0 RL [flowcleaner]
21 0 0 0 SL sdflush 0xc0b24d80 [softdepflush]
20 0 0 0 SL vlruwt 0xc25f52a8 [vnlru]
19 0 0 0 SL syncer 0xc0b14774 [syncer]
18 0 0 0 RL [bufdaemon]
17 0 0 0 RL [pagezero]
16 0 0 0 SL psleep 0xc0b2567c [vmdaemon]
15 0 0 0 SL psleep 0xc0b25644 [pagedaemon]
9 0 0 0 SL ccb_scan 0xc0b006d4 [xpt_thrd]
8 0 0 0 RL [pfpurge]
7 0 0 0 SL waiting_ 0xc0b181d8 [sctp_iterator]
14 0 0 0 SL (threaded) usb
100036 D - 0xc250ed34 [usbus1]
100035 D - 0xc250ed04 [usbus1]
100034 D - 0xc250ecd4 [usbus1]
100033 D - 0xc250eca4 [usbus1]
100032 D - 0xc2504b5c [usbus0]
100031 D - 0xc2504b2c [usbus0]
100030 D - 0xc2504afc [usbus0]
100029 D - 0xc2504acc [usbus0]
13 0 0 0 SL - 0xc0b0df64 [yarrow]
6 0 0 0 SL crypto_r 0xc0b2432c [crypto returns]
5 0 0 0 SL crypto_w 0xc0b24308 [crypto]
4 0 0 0 SL - 0xc0b0bc44 [g_down]
3 0 0 0 SL - 0xc0b0bc40 [g_up]
2 0 0 0 SL - 0xc0b0bc38 [g_event]
12 0 0 0 RL (threaded) intr
100037 I [swi0: uart uart]
100028 I [irq12: ohci0 ehci0]
100027 I [irq14: ata0]
100026 I [irq15: vr2 ata1]
100025 Run CPU 0 [irq11: vr1]
100024 RunQ [irq10: vr0]
100021 I [swi5: +]
100016 I [swi2: cambio]
100015 I [swi6: task queue]
100014 I [swi6: Giant taskq]
100006 I [swi1: netisr 0]
100005 I [swi4: clock]
100004 I [swi3: vm]
11 0 0 0 RL [idle]
1 0 1 0 SLs wait 0xc2433d48 [init]
10 0 0 0 SL audit_wo 0xc0b246a0 [audit]
0 0 0 0 RLs (threaded) kernel
100040 D - 0xc2527880 [dummynet]
100023 D - 0xc24d6580 [glxsb0 taskq]
100022 RunQ [thread taskq]
100020 D - 0xc24afd40 [acpi_task_2]
100019 D - 0xc24afd40 [acpi_task_1]
100018 D - 0xc24afd40 [acpi_task_0]
100017 D - 0xc24afd80 [kqueue taskq]
100010 D - 0xc24512c0 [firmware taskq]
100000 D sched 0xc0b0bd20 [swapper]
db> tr 100025
Tracing pid 12 tid 100025 td 0xc24d42d0
kdb_enter(c0a48fbe,c0a5d00a,1,c2522280,0,...) at 0xc0733b3a = kdb_enter+0x3a
uart_intr(c2522200,c24d42d0,c0afdc90,c2430980,4,...) at 0xc0632206 = uart_intr+0x126
intr_event_handle(c2430980,cc918ac4,cc918aa4,cc918b40,4,...) at 0xc06dff2c = intr_event_handle+0x5c
intr_execute_handlers(c0afdc90,cc918ac4,cc918b40,c2ef8a00,cc918b68,...) at 0xc09d4929 = intr_execute_handlers+0x49
atpic_handle_intr(4,cc918ac4) at 0xc09ee047 = atpic_handle_intr+0x67
Xatpic_intr4() at 0xc09cfa52 = Xatpic_intr4+0x22
--- interrupt, eip = 0xc09e0f58, esp = 0xcc918b04, ebp = 0xcc918b68 ---
bzero(c2ef8a00,0,12,0,0,...) at 0xc09e0f58 = bzero
ip_input(c2ef8a00,246,c24a77c0,cc918bd0,c0622c21,...) at 0xc083ff8d = ip_input+0x6cd
netisr_dispatch_src(1,0,c2ef8a00,cc918c08,c07bb331,...) at 0xc07c9109 = netisr_dispatch_src+0x89
netisr_dispatch(1,c2ef8a00,c24d8c00,c24d8c00,c2f16008,...) at 0xc07c93a0 = netisr_dispatch+0x20
ether_demux(c24d8c00,c2ef8a00,3,0,3,...) at 0xc07bb331 = ether_demux+0x161
ether_input(c24d8c00,c2ef8a00,c09d6ad0,c0b11780,c24d42d0,...) at 0xc07bb8af = ether_input+0x33f
vr_intr(c2499000,c24d42d0,0,109,dc13a3e2,...) at 0xc0690034 = vr_intr+0x4d4
intr_event_execute_handlers(c24337f8,c2430600,c0a6b180,52d,c2430670,...) at 0xc06def8b = intr_event_execute_handlers+0x13b
ithread_loop(c24de360,cc918d28,0,0,0,...) at 0xc06e05ea = ithread_loop+0x6a
fork_exit(c06e0580,c24de360,cc918d28) at 0xc06dc830 = fork_exit+0x90
fork_trampoline() at 0xc09cf964 = fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xcc918d60, ebp = 0 ---
db> tr 100024
Tracing pid 12 tid 100024 td 0xc24d45a0
sched_switch(c24d45a0,0,109,dc1435c0,660,...) at 0xc07270bf = sched_switch+0x1df
mi_switch(109,0,c0a6b180,52d,c24306f0,...) at 0xc071086a = mi_switch+0x11a
ithread_loop(c24de8c0,cc894d28,0,0,0,...) at 0xc06e06df = ithread_loop+0x15f
fork_exit(c06e0580,c24de8c0,cc894d28) at 0xc06dc830 = fork_exit+0x90
fork_trampoline() at 0xc09cf964 = fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xcc894d60, ebp = 0 ---
db> cont
KDB: enter: Line break on console
[thread pid 0 tid 100022 ]
Stopped at 0xc0733b3a = kdb_enter+0x3a: movl $0,0xc0b124b4 = kdb_why
db> ps
pid ppid pgrp uid state wmesg wchan cmd
3300 2371 2371 0 R sshd
3299 3298 3298 0 S+ sbwait 0xc2ddb58c ssh
3298 3100 3298 0 S+ piperd 0xc2624620 scp
3100 3098 3100 0 Ss+ pause 0xc2e95850 csh
3098 2371 3098 0 Rs sshd
2904 2371 2904 0 Ss select 0xc379a924 sshd
2476 2474 2476 0 Ss+ ttyin 0xc2e44070 csh
2474 2371 2474 0 Rs sshd
2440 1 2440 0 Rs+ getty
2433 1 2433 0 Rs cron
2426 1 2426 62 Rs ftp-proxy
2397 1 2397 0 Ss select 0xc2d71964 inetd
2371 1 2371 0 Rs sshd
2349 1 2349 0 Rs bird
2302 1 2302 0 Rs radvd
2298 2292 2285 65534 S piperd 0xc2624188 multilog
2297 2291 2285 0 R openvpn
2292 2287 2285 0 S select 0xc2d713a4 supervise
2291 2287 2285 0 S select 0xc2d71a64 supervise
2288 1 2285 0 S piperd 0xc2624c40 readproctitle
2287 1 2285 0 R svscan
2244 1 2244 0 Rs ntpd
2129 1 2129 0 Rs syslogd
50 0 0 0 SL mdwait 0xc2717000 [md1]
40 0 0 0 SL mdwait 0xc24ef000 [md0]
22 0 0 0 RL [flowcleaner]
21 0 0 0 SL sdflush 0xc0b24d80 [softdepflush]
20 0 0 0 SL vlruwt 0xc25f52a8 [vnlru]
19 0 0 0 SL syncer 0xc0b14774 [syncer]
18 0 0 0 RL [bufdaemon]
17 0 0 0 RL [pagezero]
16 0 0 0 SL psleep 0xc0b2567c [vmdaemon]
15 0 0 0 SL psleep 0xc0b25644 [pagedaemon]
9 0 0 0 SL ccb_scan 0xc0b006d4 [xpt_thrd]
8 0 0 0 RL [pfpurge]
7 0 0 0 SL waiting_ 0xc0b181d8 [sctp_iterator]
14 0 0 0 SL (threaded) usb
100036 D - 0xc250ed34 [usbus1]
100035 D - 0xc250ed04 [usbus1]
100034 D - 0xc250ecd4 [usbus1]
100033 D - 0xc250eca4 [usbus1]
100032 D - 0xc2504b5c [usbus0]
100031 D - 0xc2504b2c [usbus0]
100030 D - 0xc2504afc [usbus0]
100029 D - 0xc2504acc [usbus0]
13 0 0 0 SL - 0xc0b0df64 [yarrow]
6 0 0 0 SL crypto_r 0xc0b2432c [crypto returns]
5 0 0 0 SL crypto_w 0xc0b24308 [crypto]
4 0 0 0 SL - 0xc0b0bc44 [g_down]
3 0 0 0 SL - 0xc0b0bc40 [g_up]
2 0 0 0 SL - 0xc0b0bc38 [g_event]
12 0 0 0 WL (threaded) intr
100037 I [swi0: uart uart]
100028 I [irq12: ohci0 ehci0]
100027 I [irq14: ata0]
100026 I [irq15: vr2 ata1]
100025 I [irq11: vr1]
100024 I [irq10: vr0]
100021 I [swi5: +]
100016 I [swi2: cambio]
100015 I [swi6: task queue]
100014 I [swi6: Giant taskq]
100006 I [swi1: netisr 0]
100005 I [swi4: clock]
100004 I [swi3: vm]
11 0 0 0 RL [idle]
1 0 1 0 SLs wait 0xc2433d48 [init]
10 0 0 0 SL audit_wo 0xc0b246a0 [audit]
0 0 0 0 RLs (threaded) kernel
100040 D - 0xc2527880 [dummynet]
100023 D - 0xc24d6580 [glxsb0 taskq]
100022 Run CPU 0 [thread taskq]
100020 D - 0xc24afd40 [acpi_task_2]
100019 D - 0xc24afd40 [acpi_task_1]
100018 D - 0xc24afd40 [acpi_task_0]
100017 D - 0xc24afd80 [kqueue taskq]
100010 D - 0xc24512c0 [firmware taskq]
100000 D sched 0xc0b0bd20 [swapper]
db> tr 100024
Tracing pid 12 tid 100024 td 0xc24d45a0
sched_switch(c24d45a0,0,109,b67db121,673,...) at 0xc07270bf = sched_switch+0x1df
mi_switch(109,0,c0a6b180,52d,c24306f0,...) at 0xc071086a = mi_switch+0x11a
ithread_loop(c24de8c0,cc894d28,0,0,0,...) at 0xc06e06df = ithread_loop+0x15f
fork_exit(c06e0580,c24de8c0,cc894d28) at 0xc06dc830 = fork_exit+0x90
fork_trampoline() at 0xc09cf964 = fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xcc894d60, ebp = 0 ---
db> tr 100022
Tracing pid 0 tid 100022 td 0xc24d4b40
kdb_enter(c0a48fbe,c0a5d00a,0,c2522280,0,...) at 0xc0733b3a = kdb_enter+0x3a
uart_intr(c2522200,c24d4b40,c0afdc90,c2430980,4,...) at 0xc0632206 = uart_intr+0x126
intr_event_handle(c2430980,c23f5c20,0,c25db2bc,4,...) at 0xc06dff2c = intr_event_handle+0x5c
intr_execute_handlers(c0afdc90,c23f5c20,c25db2bc,0,c23f5c60,...) at 0xc09d4929 = intr_execute_handlers+0x49
atpic_handle_intr(4,c23f5c20) at 0xc09ee047 = atpic_handle_intr+0x67
Xatpic_intr4() at 0xc09cfa52 = Xatpic_intr4+0x22
--- interrupt, eip = 0xc073cf01, esp = 0xc23f5c60, ebp = 0xc23f5c60 ---
sleepq_release(c25db2bc,0,0,0,c24afc80,...) at 0xc073cf01 = sleepq_release+0x31
wakeup(c25db2bc,0,0,0,0,...) at 0xc07109f1 = wakeup+0x41
taskqueue_run_locked(c24afc80,c24afc9c,0,c0a66f59,0,...) at 0xc073e8e2 = taskqueue_run_locked+0x142
taskqueue_thread_loop(c0b134ec,c23f5d28,0,0,0,...) at 0xc073ef27 = taskqueue_thread_loop+0x87
fork_exit(c073eea0,c0b134ec,c23f5d28) at 0xc06dc830 = fork_exit+0x90
fork_trampoline() at 0xc09cf964 = fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xc23f5d60, ebp = 0 ---
... cont dance again ...
db> tr 100022
Tracing pid 0 tid 100022 td 0xc24d4b40
kdb_enter(c0a48fbe,c0a5d00a,0,c2522280,0,...) at 0xc0733b3a = kdb_enter+0x3a
uart_intr(c2522200,c24d4b40,c0afdc90,c2430980,4,...) at 0xc0632206 = uart_intr+0x126
intr_event_handle(c2430980,c23f5c48,c23f5c14,c24afc80,4,...) at 0xc06dff2c = intr_event_handle+0x5c
intr_execute_handlers(c0afdc90,c23f5c48,c24afc80,c25db2bc,c23f5cbc,...) at 0xc09d4929 = intr_execute_handlers+0x49
atpic_handle_intr(4,c23f5c48) at 0xc09ee047 = atpic_handle_intr+0x67
Xatpic_intr4() at 0xc09cfa52 = Xatpic_intr4+0x22
--- interrupt, eip = 0xc073e7e3, esp = 0xc23f5c88, ebp = 0xc23f5cbc ---
taskqueue_run_locked(c24afc80,c24afc9c,0,c0a66f59,0,...) at 0xc073e7e3 = taskqueue_run_locked+0x43
taskqueue_thread_loop(c0b134ec,c23f5d28,0,0,0,...) at 0xc073ef27 = taskqueue_thread_loop+0x87
fork_exit(c073eea0,c0b134ec,c23f5d28) at 0xc06dc830 = fork_exit+0x90
fork_trampoline() at 0xc09cf964 = fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xc23f5d60, ebp = 0 ---
db> tr 100024
Tracing pid 12 tid 100024 td 0xc24d45a0
sched_switch(c24d45a0,0,109,4b19c082,684,...) at 0xc07270bf = sched_switch+0x1df
mi_switch(109,0,c0a6b180,52d,c24306f0,...) at 0xc071086a = mi_switch+0x11a
ithread_loop(c24de8c0,cc894d28,0,0,0,...) at 0xc06e06df = ithread_loop+0x15f
fork_exit(c06e0580,c24de8c0,cc894d28) at 0xc06dc830 = fork_exit+0x90
fork_trampoline() at 0xc09cf964 = fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xcc894d60, ebp = 0 ---
db> tr 100025
Tracing pid 12 tid 100025 td 0xc24d42d0
kdb_enter(c0a48fbe,c0a5d00a,58,c2522280,0,...) at 0xc0733b3a = kdb_enter+0x3a
uart_intr(c2522200,c24d42d0,c0afdc90,c2430980,4,...) at 0xc0632206 = uart_intr+0x126
intr_event_handle(c2430980,cc918ac8,0,0,4,...) at 0xc06dff2c = intr_event_handle+0x5c
intr_execute_handlers(c0afdc90,cc918ac8,0,c2efa300,cc918b68,...) at 0xc09d4929 = intr_execute_handlers+0x49
atpic_handle_intr(4,cc918ac8) at 0xc09ee047 = atpic_handle_intr+0x67
Xatpic_intr4() at 0xc09cfa52 = Xatpic_intr4+0x22
--- interrupt, eip = 0xc083e822, esp = 0xcc918b08, ebp = 0xcc918b68 ---
ip_forward(c2efa300,0,12,0,0,...) at 0xc083e822 = ip_forward+0x3b2
ip_input(c2efa300,246,c24a1580,cc918bd0,c0622c21,...) at 0xc083ff8d = ip_input+0x6cd
netisr_dispatch_src(1,0,c2efa300,cc918c08,c07bb331,...) at 0xc07c9109 = netisr_dispatch_src+0x89
netisr_dispatch(1,c2efa300,c24d8c00,c24d8c00,c2f1a808,...) at 0xc07c93a0 = netisr_dispatch+0x20
ether_demux(c24d8c00,c2efa300,3,0,3,...) at 0xc07bb331 = ether_demux+0x161
ether_input(c24d8c00,c2efa300,c09d6ad0,c0b11780,c24d42d0,...) at 0xc07bb8af = ether_input+0x33f
vr_intr(c2499000,c24d42d0,0,109,48d02b08,...) at 0xc0690034 = vr_intr+0x4d4
intr_event_execute_handlers(c24337f8,c2430600,c0a6b180,52d,c2430670,...) at 0xc06def8b = intr_event_execute_handlers+0x13b
ithread_loop(c24de360,cc918d28,0,0,0,...) at 0xc06e05ea = ithread_loop+0x6a
fork_exit(c06e0580,c24de360,cc918d28) at 0xc06dc830 = fork_exit+0x90
fork_trampoline() at 0xc09cf964 = fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xcc918d60, ebp = 0 ---
More information about the freebsd-net
mailing list