Hard lock on 7.1-RC1
Dylan Cochran
a134qaed at gmail.com
Mon Dec 22 22:18:23 PST 2008
On Sun, Dec 21, 2008 at 3:05 PM, Dylan Cochran <a134qaed at gmail.com> wrote:
> I'm hitting a strange lockup on 7.1-RC1, where some socket operations
> seem to stall, as well as basic file operations. The only reproducable
> way I have of triggering it is by doing multiple inserts into
> phpmyadmin on lighttpd+fastcgi php5 + mysql51-server, though this
> isn't the only thing which triggers it, just the only one which is
> semi reliable. I've also reproduced this on another machine, set up
> specifically to rule out any machine specific problems (as they have
> different drive controllers, one uses gjournal, etc).
>
> I inititially built a kernel with SW_WATCHDOG, and attempted to use
> watchdogd and DDB to get an output from show locks, but the watchdogd
> hasn't panicked the machine, so at least devfs is still unlocked; I'm
> not able to get physical access to the machine until monday.
>
> The bug was introduced as far as I can tell, between 7.1-BETA2 and 7.1-RC1.
>
> Any suggestions on what I can test for tommorow?
>
I updated the kernel source to RELENG_7_1 as of a few hours ago, and
built with DEBUG_VFS_LOCKS as well.
Luckily the backtrace included the operating it was at before the
watchdog, which seems to be kern_sendfile(). I'm no expert at kernel
debugging, so any assistance on tracking this down further would be
greatly appreciated.
And, as promised, here is the output of script after the watchdog induced panic:
Script started on Tue Dec 23 01:05:56 2008
# cu -l cua01
interrupt total
irq4: sio0 623
irq6: fdc0 1
irq17: fwohci0 3
irq18: rl0 uhci2++ 60718
irq23: rl1 ehci0 206
cpu0: timer 514596
Total 576147
KDB: stack backtrace:
db_trace_self_wrapper(c0b55b52,e66e0ae0,c07615e9,c0b50617,8ca93,...)
at db_trace_self_wrapper+0x26
kdb_backtrace(c0b50617,8ca93,0,c41a7690,2,...) at kdb_backtrace+0x29
hardclock(0,c07ff29d,0,0,4,...) at hardclock+0x1f9
lapic_handle_timer(e66e0b08) at lapic_handle_timer+0x9c
Xtimerint() at Xtimerint+0x1f
--- interrupt, eip = 0xc07ff29d, esp = 0xe66e0b48, ebp = 0xe66e0c34 ---
kern_sendfile(c41a7690,e66e0cfc,0,0,0,...) at kern_sendfile+0x90d
do_sendfile(e66e0d2c,c0aba265,c41a7690,e66e0cfc,20,...) at do_sendfile+0xb1
sendfile(c41a7690,e66e0cfc,20,16,e66e0d2c,...) at sendfile+0x13
syscall(e66e0d38) at syscall+0x335
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (393, FreeBSD ELF32, sendfile), eip = 0x282cb0cb, esp =
0xbfbfc7cc, ebp = 0xbfbfe848 ---
KDB: enter: watchdog timeout
[thread pid 1288 tid 100060 ]
Stopped at kdb_enter_why+0x3a: movl $0,kdb_why
db> show lock
db> p show all proc
pid ppid pgrp uid state wmesg wchan cmd
1600 902 902 0 R watchdogd
1470 1469 1470 0 S+ ttyin 0xc418fc10 csh
1469 1 1469 0 Ss+ wait 0xc46032b8 login
1468 1 1468 0 Ss+ ttyin 0xc41ac810 getty
1427 1 1427 0 Ss nanslp 0xc0c7dc44 cron
1420 1 1420 0 Ss select 0xc0c88eb8 sshd
1419 1289 1289 80 SJ accept 0xc445ab9a php-cgi
1418 1289 1289 80 SJ accept 0xc445ab9a php-cgi
1417 1289 1289 80 SJ accept 0xc445ab9a php-cgi
1416 1289 1289 80 SJ accept 0xc445ab9a php-cgi
1415 1289 1289 80 SJ accept 0xc445ab9a php-cgi
1414 1289 1289 80 SJ accept 0xc445ab9a php-cgi
1413 1289 1289 80 SJ accept 0xc445ab9a php-cgi
1412 1289 1289 80 SJ accept 0xc445ab9a php-cgi
1411 1289 1289 80 SJ accept 0xc445ab9a php-cgi
1410 1289 1289 80 SJ accept 0xc445ab9a php-cgi
1409 1289 1289 80 SJ accept 0xc445ab9a php-cgi
1408 1289 1289 80 SJ accept 0xc445ab9a php-cgi
1407 1289 1289 80 SJ accept 0xc445ab9a php-cgi
--More--
1406 1289 1289 80 SJ accept 0xc445ab9a php-cgi
--More--
1405 1289 1289 80 SJ accept 0xc445ab9a php-cgi
1404 1289 1289 80 SJ accept 0xc445ab9a php-cgi
1403 1300 1300 80 SJ accept 0xc445a6ba php-cgi
1402 1300 1300 80 SJ accept 0xc445a6ba php-cgi
1401 1300 1300 80 SJ accept 0xc445a6ba php-cgi
1400 1300 1300 80 SJ accept 0xc445a6ba php-cgi
1399 1300 1300 80 RJ php-cgi
1398 1300 1300 80 SJ accept 0xc445a6ba php-cgi
1397 1300 1300 80 SJ accept 0xc445a6ba php-cgi
1396 1300 1300 80 SJ accept 0xc445a6ba php-cgi
1395 1300 1300 80 SJ accept 0xc445a6ba php-cgi
1394 1300 1300 80 SJ accept 0xc445a6ba php-cgi
1393 1300 1300 80 SJ accept 0xc445a6ba php-cgi
1392 1300 1300 80 SJ accept 0xc445a6ba php-cgi
1391 1300 1300 80 SJ accept 0xc445a6ba php-cgi
1390 1300 1300 80 SJ accept 0xc445a6ba php-cgi
1389 1300 1300 80 SJ accept 0xc445a6ba php-cgi
1388 1300 1300 80 SJ accept 0xc445a6ba php-cgi
1322 1 1322 25 SsJ pause 0xc4562b40 sendmail
1316 1 1316 0 RsJ sendmail
--More--
1306 1 1306 0 SsJ select 0xc0c88eb8 sshd
1300 1288 1300 80 SsJ wait 0xc43beae0 initial thread
1289 1288 1289 80 SsJ wait 0xc4564000 initial thread
1288 1 1287 80 RJ CPU 0 lighttpd
1286 1173 1170 88 R+J (threaded) mysqld
100114 S sigwait 0xe67a2be0 mysqld
100090 S ucond 0xc4359580 mysqld
100092 RunQ mysqld
100091 RunQ mysqld
100089 S ucond 0xc43598c0 mysqld
100088 S ucond 0xc4358c80 mysqld
100087 S ucond 0xc4094780 mysqld
100086 S ucond 0xc4359140 mysqld
100049 S select 0xc0c88eb8 initial thread
1173 1 1170 88 S+J wait 0xc43be2b8 sh
920 1 920 0 R+ (threaded) venti
100072 S ucond 0xc4358300 venti
100071 S accept 0xc445b03a venti
100070 S ucond 0xc4358200 venti
100069 S ucond 0xc4095980 venti
--More--
100068 S ucond 0xc41b1e00 venti
100067 S ucond 0xc4358100 venti
100066 S ucond 0xc4358280 venti
100065 S ucond 0xc41a45c0 venti
100064 S accept 0xc445a1da venti
100063 RunQ venti
100053 RunQ venti
902 1 902 0 Ss wait 0xc4343570 watchdogd
891 1 891 102 Rs gmond
770 1 770 0 Ss select 0xc0c88eb8 devd
338 1 338 65 Ss select 0xc0c88eb8 dhclient
322 1 49 0 S+ select 0xc0c88eb8 dhclient
48 0 0 0 SL sdflush 0xc0c95228 [softdepflush]
47 0 0 0 RL [syncer]
46 0 0 0 SL vlruwt 0xc4342000 [vnlru]
45 0 0 0 SL psleep 0xc0c89324 [bufdaemon]
44 0 0 0 SL pgzero 0xc0c95e00 [pagezero]
43 0 0 0 SL psleep 0xc0c95a18 [vmdaemon]
42 0 0 0 SL psleep 0xc0c959e0 [pagedaemon]
41 0 0 0 SL waiting_ 0xc0c8b0b4 [sctp_iterator]
--More--
40 0 0 0 WL [irq7: ppbus0 ppc0]
39 0 0 0 WL [irq1: atkbd0]
38 0 0 0 WL [irq15: ata1]
37 0 0 0 WL [irq14: ata0]
36 0 0 0 WL [swi0: sio]
35 0 0 0 SL - 0xc418d83c [fdc0]
34 0 0 0 SL - 0xc4097000 [fw0_probe]
33 0 0 0 SL - 0xc408c200 [fw0_taskq]
32 0 0 0 SL usbevt 0xc4013210 [usb4]
31 0 0 0 WL [irq23: rl1 ehci0]
30 0 0 0 SL usbevt 0xc407f210 [usb3]
29 0 0 0 SL usbevt 0xc406c210 [usb2]
28 0 0 0 WL [irq18: rl0 uhci2++]
27 0 0 0 SL usbevt 0xc405c210 [usb1]
26 0 0 0 WL [irq19: uhci1]
25 0 0 0 SL usbtsk 0xc0c7b234 [usbtask-dr]
24 0 0 0 SL usbtsk 0xc0c7b220 [usbtask-hc]
23 0 0 0 SL usbevt 0xc4024210 [usb0]
22 0 0 0 WL [irq16: uhci0 uhci3]
21 0 0 0 WL [irq9: acpi0]
--More--
20 0 0 0 WL [swi6: task queue]
19 0 0 0 WL [swi6: Giant taskq]
18 0 0 0 SL - 0xc3fe3700 [thread taskq]
17 0 0 0 WL [swi5: +]
9 0 0 0 SL - 0xc3fe3880 [acpi_task_2]
8 0 0 0 SL - 0xc3fe3880 [acpi_task_1]
7 0 0 0 SL - 0xc3fe3880 [acpi_task_0]
16 0 0 0 WL [swi2: cambio]
6 0 0 0 SL ccb_scan 0xc0c4b054 [xpt_thrd]
5 0 0 0 SL - 0xc3fe3a80 [kqueue taskq]
15 0 0 0 SL - 0xc0c7da74 [yarrow]
4 0 0 0 SL - 0xc0c7b62c [g_down]
3 0 0 0 SL - 0xc0c7b628 [g_up]
2 0 0 0 SL - 0xc0c7b620 [g_event]
14 0 0 0 WL [swi3: vm]
13 0 0 0 WL [swi4: clock sio]
12 0 0 0 WL [swi1: net]
11 0 0 0 RL [idle: cpu0]
1 0 1 0 SLs wait 0xc3f16ae0 [init]
10 0 0 0 SL audit_wo 0xc0c94c64 [audit]
--More--
0 0 0 0 SLs sched 0xc0c7b6e0 [swapper]
db> Killed
More information about the freebsd-stable
mailing list