[patch] deadlock debugging
Kostik Belousov
kostikbel at gmail.com
Fri Jun 23 13:58:45 UTC 2006
On Fri, Jun 23, 2006 at 04:25:58PM +0300, Giorgos Keramidas wrote:
> On 2006-06-07 11:43, Kostik Belousov <kostikbel at gmail.com> wrote:
> > Reports of the deadlocks are reccurrent topic on the current-
> > and stable- lists. Many of us have to repeat the instructions
> > on how to provide the useful initial bug report from them.
> >
> > Please, comment proposed addition to the kernel debugging
> > chapter of the developer handbook.
>
> Hi Kostik,
>
> > Obviously, I am not an english native speaker. Your corrections
> > for both factual material and grammar/style are very much
> > welcome !
> >
> > P.S. I'm not on the list, do not remove CC: to me on replying.
>
> Ok :)
>
> This seems like a useful addition to the developer's handbook,
> but I have some minor comments. See inline text below:
>
> > Index: en_US.ISO8859-1/books/developers-handbook/kerneldebug/chapter.sgml
> > ===================================================================
> > RCS file: /usr/local/arch/ncvs/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug/chapter.sgml,v
> > retrieving revision 1.64
> > diff -u -r1.64 chapter.sgml
> > --- en_US.ISO8859-1/books/developers-handbook/kerneldebug/chapter.sgml 5 Jan 2006 20:03:34 -0000 1.64
> > +++ en_US.ISO8859-1/books/developers-handbook/kerneldebug/chapter.sgml 7 Jun 2006 08:39:20 -0000
> > @@ -821,6 +821,41 @@
> > on any configured console driver, including a serial
> > console.</para>
> > </sect1>
> > +
> > + <sect1 id="kerneldebug-deadlocks">
> > + <title>Debugging the Deadlocks</title>
>
> `Debugging Kernel Deadlocks' is probably a better title here, since
> deadlocks can only occur in the kernel and `the Deadlocks' doesn't
> really make this as obvious as I'd probably want it to be.
>
> > + <para>You may experience so called deadlocks, the situation where
> > + system stops doing useful work. To provide the useful bug report
> > + in this situation, you shall use ddb as described above. Please,
> > + include the output of <command>ps</command> and
> > + <command>trace</command> for suspected processes in the
> > + report.</para>
>
> This paragraph has a few minor syntax buglets. English is not my native
> language, but I would probably rewrite this as:
>
> | <para>Modern &os; releases have been extended with support for
> | Symmetric Multiprocessing (SMP). To support highly parallel
> | processing, the &os; kernel uses a lot of internal locking and
> | synchronization primitives, to allow multiple kernel threads
> | to run concurrently on systems that can support such a mode of
> | operation. Bugs in the use of these internal locking
> | mechanisms can lead to a situation where one or more kernel
> | threads block compete for the same resources and block
> | indefinitely waiting for each other. When this happens, the
> | system may become unstable, leading either to a crash or
> | appear to <quote>hang</quote>. This hang is called a
> | <quote>deadlock</quote>.</para>
> |
> | <para>Debugging a deadlock may be a tricky and difficult thing,
> | but &os; provides some tools that may assist you in tracking
> | down the problem or collect information about the deadlock
> | when it occurs.</para>
> |
> | <para>One of these tools is the kernel debugger,
> | <application>DDB</application>, which you can use as described
> | in the previous sections to collect useful information for
> | such a bug. <application>DDB</application> commands that are
> | very useful and may provide information that helps debugging a
> | deadlock are:</para>
> |
> | <itemizedlist>
> | <listitem><para><command>ps</command></para></listitem>
> | <listitem><para><command>trace</command></para></listitem>
> | </itemizedlist>
> |
> | <para>Use the <command>ps</command> command to list all the
> | processes and then use <command>trace</command> on processes
> | that are suspects for having caused the deadlock.</para>
> |
> | <para>Other commands that can provide useful information for
> | tracking down the cause of a deadlock are:</para>
> |
> | <itemizedlist>
> | <listitem><para><command>show allcpu</command></para></listitem>
> | <listitem><para><command>show alllocks</command></para></listitem>
> | <listitem><para><command>show lockedvnods</command></para></listitem>
> | </itemizedlist>
> |
> | <para>Useful information about what each process was doing, at
> | the time the deadlock occured, can be listed with:</para>
> |
> | <itemizedlist>
> | <listitem><para><command>where <replaceable>PID</replaceable></command></para></listitem>
> | </itemizedlist>
> |
> | <para>The output of the <command>where</command> command tends
> | to be very useful for the processes listed in the output of
> | the <command>show</command> commands.</para>
> |
> | <para>To obtain meaningful backtraces for threaded processes,
> | use <command>thread thread-id</command> first, to switch to
> | the correct thread, and then get a backtrace
> | with <command>where</command>.</para>
>
> Does this version look ok to you? I can handle the merging of this
> change with your initial diff/patch
>
> > + <para>If possible, consider doing further investigation. Receipt
> > + below is especially usefull if you suspect deadlock occurs in the
> > + VFS layer. Add the options
> > + <programlisting>makeoptions DEBUG=-g
> > + options INVARIANTS
> > + options INVARIANT_SUPPORT
> > + options WITNESS
> > + options DEBUG_LOCKS
> > + options DEBUG_VFS_LOCKS
> > + options DIAGNOSTIC</programlisting>
> > +
> > + to the kernel config. When deadlock occurs, in addition to the
> > + output of the <command>ps</command> command, provide information
> > + from the <command>show allpcpu</command>, <command>show
> > + alllocks</command> and <command>show
> > + lockedvnods</command>. More, please provide output of the
> > + <command>where pid</command> for each process id mentioned in
> > + the output of the <command>show</command> commands.
> > + </para>
> > +
> > + <para>For threaded processes, to obtain meaningful backtraces, use
> > + <command>thread thread-id</command> to switch to the thread
> > + stack, and do backtrace with <command>where</command>.</para>
> > + </sect1>
> > </chapter>
>
> This part is also nice, but IMHO it would be even nicer if we could
> expand it a bit more. How about something like this?
>
> | <!-- On reproducing a deadlock and `doing further investigation' -->
> |
> | <para>Deadlocks are pretty nasty bugs, since they are not very
> | easy to reproduce. Their occurence depends on specific
> | timing, synchronization, system load and many more factors.
> | This makes it hard to reliably reproduce a deadlock bug.
> | Since reproducing a bug is some times a crucial part of
> | gathering all the necessary information, you may have to spend
> | some time investigating the deadlock. Naturally, this is not
> | always possible for production systems, but if you can
> | reproduce the deadlock on a test system which can afford
> | staying off-line for extended periods of time, then consider
> | staying inside <application>DDB</application> while you are
> | investigating the deadlock further.</para>
> |
> | <para>A serial console can be extremely helpful in collecting
> | <application>DDB</application> output.</para>
> |
> | <para>If it's impossible to set up a serial console
> | (i.e. because you cannot find or afford a second system to
> | configure as a testbed), emulators like
> | <filename role="port">emulators/qemu</filename>,
> | <filename role="port">emulators/vmware2</filename> or
> | <filename role="port">emulators/bochs</filename> may prove a
> | very efficient way of debugging kernel issues, like a
> | deadlock.</para>
>
> Part #2 ...
>
> | <!-- On kernel options that are useful for debugging locking problems. -->
> |
> | <para>Apart from the usual kernel options that are useful for
> | debugging kernel problems, there are some options that are
> | prticularly useful and targetted at debugging locking
> | problems. These options are:</para>
> |
> | <programlisting> options INVARIANTS
> | options INVARIANT_SUPPORT
> | options WITNESS
> | options DEBUG_LOCKS
> | options DEBUG_VFS_LOCKS
> | options DIAGNOSTIC</programlisting>
>
> Any help in expanding these parts (especially the second one) is more
> than welcome :-)
I like you changes, they provide useful context and give the
proper exposure to the problems.
My intent for the addition was to have the place for pointing out when asked
"how to debug deadlocks" ? Could you additions and my do-it guide
coexist side-by-side ? For instance, by summarizing the information
developers want to obtain from the problem machine, at the end of section ?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-doc/attachments/20060623/60c9bb14/attachment.sig>
More information about the freebsd-doc
mailing list