How best to debug locking/scheduler problems
Mel Flynn
mel.flynn+fbsd.hackers at mailing.thruhere.net
Wed Jun 17 19:52:57 UTC 2009
On Wednesday 17 June 2009 04:15:26 John Baldwin wrote:
> On Tuesday 16 June 2009 7:01:45 pm Mel Flynn wrote:
> > On Tuesday 16 June 2009 11:02:42 John Baldwin wrote:
> > > On Tuesday 16 June 2009 1:52:23 pm Mel Flynn wrote:
> > > > Hi John,
> > > >
> > > > On Tuesday 16 June 2009 04:19:57 John Baldwin wrote:
> > > > > On Monday 15 June 2009 5:53:05 pm Mel Flynn wrote:
> > > > > > PID TID COMM TDNAME KSTACK
> > > > > > 4283 100215 kdeinit4 - mi_switch
> > > > > > turnstile_wait _mtx_lock_sleep uipc_peeraddr kern_getpeername
> > > > > > getpeername syscall Xint0x80_syscall
> > > > > > % ps -ww 4283
> > > > > > PID TT STAT TIME COMMAND
> > > > > > 4283 ?? T 0:00.38 kdeinit4: kdeinit4: kio_http http
> > > > > > local:/tmp/ksocket-mel/klauncherxJ1635.slave-socket
> > > > > > local:/tmp/ksocket- mel/plasmayC1653.slave-socket (kdeinit4)
> > > > > >
> > > > > > %ls -l /tmp/ksocket-mel/
> > > > > >
> > > > > > total 2
> > > > > > -rw-rw-r-- 1 mel wheel 62 Jun 14 22:55 KSMserver__0
> > > > > > srw------- 1 mel wheel 0 Jun 14 22:55 kdeinit4__0
> > > > > > srwxrwxr-x 1 mel wheel 0 Jun 14 22:55
> > > > > > klauncherxJ1635.slave-socket
> > > > >
> > > > > You can use kgdb and the scripts at www.freebsd.org/~jhb/gdb.
> > > > > Simply run 'kgdb' as root and do 'lcd /folder/with/scripts' and
> > > > > 'source gdb6'. You can then do 'lockchain 4283' to find who holds
> > > > > the lock this thread is blocked on and what state they are in.
> > > >
> > > > Looks like a deadlock:
> > > >
> > > > (kgdb) lockchain 4283
> > > > thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0
> > > > "unp_mtx" thread 100122 (pid 1635, klauncher) blocked on lock
> > > > 0xc6806348 "unp_mtx" thread 100215 (pid 4283, kdeinit4) blocked on
> > > > lock 0xc64374a0 "unp_mtx" thread 100122 (pid 1635, klauncher) blocked
> > > > on lock 0xc6806348 "unp_mtx" thread 100215 (pid 4283, kdeinit4)
> > > > blocked on lock 0xc64374a0 "unp_mtx" thread 100122 (pid 1635,
> > > > klauncher) blocked on lock 0xc6806348 "unp_mtx" thread 100215 (pid
> > > > 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx" thread 100122
> > > > (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx" thread
> > > > 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx"
> > > > thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348
> > > > "unp_mtx" thread 100215 (pid 4283, kdeinit4) blocked on lock
> > > > 0xc64374a0 "unp_mtx" thread 100122 (pid 1635, klauncher) blocked on
> > > > lock 0xc6806348 "unp_mtx" thread 100215 (pid 4283, kdeinit4) blocked
> > > > on lock 0xc64374a0 "unp_mtx" thread 100122 (pid 1635, klauncher)
> > > > blocked on lock 0xc6806348 "unp_mtx" thread 100215 (pid 4283,
> > > > kdeinit4) blocked on lock 0xc64374a0 "unp_mtx" thread 100122 (pid
> > > > 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx" thread 100215
> > > > (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx" thread
> > > > 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx"
> > > > thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0
> > > > "unp_mtx" thread 100122 (pid 1635, klauncher) blocked on lock
> > > > 0xc6806348 "unp_mtx" DEADLOCK
> > > >
> > > > Looking through the scripts now to see how I can get more info on the
> > > > call chain and hoping I don't panic the machine ;). It is quite
> > > > random to reproduce.
> > >
> > > In kgdb you can simply do 'tid 100122' followed by 'bt' and 'tid
> > > 100215' followed by 'bt'.
> >
> > Cool, thanks for helping John. Of course it pretty much shows me what
> > procstat -k shows and can't get any info on the userland part, but I can
> > fully inspect the locks and threads.
> >
> > Both threads are in TDS_INHIBITED state, and blocked on:
> > (kgdb) frame 0
> > #0 sched_switch (td=0xc5971240, newtd=0xc4d39900, flags=259)
> > at /usr/src/sys/kern/sched_ule.c:1864
> > 1864 cpuid = PCPU_GET(cpuid);
>
> That doesn't really tell us anything except that it isn't running. We know
> it is actually blocked on a lock, and we need the full stack trace to see
> where the two threads were trying to acquire the locks, hence 'bt'. '
> procstat -k' output would be fine, too.
They're blocking on each other:
(kgdb) tid 100122
(kgdb) print td->td_contested.lh_first
$16 = (struct turnstile *) 0xc4f8fb00
(kgdb) print td->td_lock
$17 = (struct mtx * volatile) 0xc538cb00
(kgdb) tid 100215
(kgdb) print td->td_contested.lh_first
$18 = (struct turnstile *) 0xc538cb00
(kgdb) print td->td_lock
$19 = (struct mtx * volatile) 0xc4f8fb00
the respective bt's:
(kgdb) tid 100122
#0 sched_switch (td=0xc56e8900, newtd=0xc4d39b40, flags=259)
at /usr/src/sys/kern/sched_ule.c:1864
#1 0xc064bcfa in mi_switch (flags=259, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:444
#2 0xc067d30b in turnstile_wait (ts=0xc538cb00, owner=0xc5971240,
queue=Variable "queue" is not available.
)
at /usr/src/sys/kern/subr_turnstile.c:745
#3 0xc06346ac in _mtx_lock_sleep (m=0xc6806348, tid=3312355584, opts=0,
file=0x0, line=0)
at /usr/src/sys/kern/kern_mutex.c:447
#4 0xc06a68a5 in uipc_peeraddr (so=0xc64309a8, nam=0xe79a2c70)
at /usr/src/sys/kern/uipc_usrreq.c:682
#5 0xc06a1e71 in kern_getpeername (td=0xc56e8900, fd=12, sa=0xe79a2c70,
alen=0xe79a2c6c)
at /usr/src/sys/kern/uipc_syscalls.c:1566
#6 0xc06a1f72 in getpeername (td=0xc56e8900, uap=0xe79a2cf8)
at /usr/src/sys/kern/uipc_syscalls.c:1527
#7 0xc0872415 in syscall (frame=0xe79a2d38) at
/usr/src/sys/i386/i386/trap.c:1073
#8 0xc0856970 in Xint0x80_syscall () at
/usr/src/sys/i386/i386/exception.s:261
#9 0x00000033 in ?? ()
(kgdb) tid 100215
(kgdb) bt
#0 sched_switch (td=0xc5971240, newtd=0xc4d39900, flags=259)
at /usr/src/sys/kern/sched_ule.c:1864
#1 0xc064bcfa in mi_switch (flags=259, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:444
#2 0xc067d30b in turnstile_wait (ts=0xc4f8fb00, owner=0xc56e8900,
queue=Variable "queue" is not available.
)
at /usr/src/sys/kern/subr_turnstile.c:745
#3 0xc06346ac in _mtx_lock_sleep (m=0xc64374a0, tid=3315012160, opts=0,
file=0x0, line=0)
at /usr/src/sys/kern/kern_mutex.c:447
#4 0xc06a68a5 in uipc_peeraddr (so=0xc6976338, nam=0xe9ae9c70)
at /usr/src/sys/kern/uipc_usrreq.c:682
#5 0xc06a1e71 in kern_getpeername (td=0xc5971240, fd=7, sa=0xe9ae9c70,
alen=0xe9ae9c6c)
at /usr/src/sys/kern/uipc_syscalls.c:1566
#6 0xc06a1f72 in getpeername (td=0xc5971240, uap=0xe9ae9cf8)
at /usr/src/sys/kern/uipc_syscalls.c:1527
#7 0xc0872415 in syscall (frame=0xe9ae9d38) at
/usr/src/sys/i386/i386/trap.c:1073
#8 0xc0856970 in Xint0x80_syscall () at
/usr/src/sys/i386/i386/exception.s:261
#9 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
--
Mel
More information about the freebsd-hackers
mailing list