[patch] deadlock debugging

Giorgos Keramidas keramida at freebsd.org
Fri Jun 23 13:26:25 UTC 2006


On 2006-06-07 11:43, Kostik Belousov <kostikbel at gmail.com> wrote:
> Reports of the deadlocks are reccurrent topic on the current-
> and stable- lists. Many of us have to repeat the instructions
> on how to provide the useful initial bug report from them.
>
> Please, comment proposed addition to the kernel debugging
> chapter of the developer handbook.

Hi Kostik,

> Obviously, I am not an english native speaker. Your corrections
> for both factual material and grammar/style are very much
> welcome !
>
> P.S. I'm not on the list, do not remove CC: to me on replying.

Ok :)

This seems like a useful addition to the developer's handbook,
but I have some minor comments.  See inline text below:

> Index: en_US.ISO8859-1/books/developers-handbook/kerneldebug/chapter.sgml
> ===================================================================
> RCS file: /usr/local/arch/ncvs/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug/chapter.sgml,v
> retrieving revision 1.64
> diff -u -r1.64 chapter.sgml
> --- en_US.ISO8859-1/books/developers-handbook/kerneldebug/chapter.sgml	5 Jan 2006 20:03:34 -0000	1.64
> +++ en_US.ISO8859-1/books/developers-handbook/kerneldebug/chapter.sgml	7 Jun 2006 08:39:20 -0000
> @@ -821,6 +821,41 @@
>        on any configured console driver, including a serial
>        console.</para>
>    </sect1>
> +
> +  <sect1 id="kerneldebug-deadlocks">
> +    <title>Debugging the Deadlocks</title>

`Debugging Kernel Deadlocks' is probably a better title here, since
deadlocks can only occur in the kernel and `the Deadlocks' doesn't
really make this as obvious as I'd probably want it to be.

> +    <para>You may experience so called deadlocks, the situation where
> +      system stops doing useful work. To provide the useful bug report
> +      in this situation, you shall use ddb as described above. Please,
> +      include the output of <command>ps</command> and
> +      <command>trace</command> for suspected processes in the
> +      report.</para>

This paragraph has a few minor syntax buglets.  English is not my native
language, but I would probably rewrite this as:

|       <para>Modern &os; releases have been extended with support for
|         Symmetric Multiprocessing (SMP).  To support highly parallel
|         processing, the &os; kernel uses a lot of internal locking and
|         synchronization primitives, to allow multiple kernel threads
|         to run concurrently on systems that can support such a mode of
|         operation.  Bugs in the use of these internal locking
|         mechanisms can lead to a situation where one or more kernel
|         threads block compete for the same resources and block
|         indefinitely waiting for each other.  When this happens, the
|         system may become unstable, leading either to a crash or
|         appear to <quote>hang</quote>.  This hang is called a
|         <quote>deadlock</quote>.</para>
|
|       <para>Debugging a deadlock may be a tricky and difficult thing,
|         but &os; provides some tools that may assist you in tracking
|         down the problem or collect information about the deadlock
|         when it occurs.</para>
|
|       <para>One of these tools is the kernel debugger,
|         <application>DDB</application>, which you can use as described
|         in the previous sections to collect useful information for
|         such a bug.  <application>DDB</application> commands that are
|         very useful and may provide information that helps debugging a
|         deadlock are:</para>
|
|       <itemizedlist>
|         <listitem><para><command>ps</command></para></listitem>
|         <listitem><para><command>trace</command></para></listitem>
|       </itemizedlist>
|
|       <para>Use the <command>ps</command> command to list all the
|         processes and then use <command>trace</command> on processes
|         that are suspects for having caused the deadlock.</para>
|
|       <para>Other commands that can provide useful information for
|         tracking down the cause of a deadlock are:</para>
|
|       <itemizedlist>
|         <listitem><para><command>show allcpu</command></para></listitem>
|         <listitem><para><command>show alllocks</command></para></listitem>
|         <listitem><para><command>show lockedvnods</command></para></listitem>
|       </itemizedlist>
|
|       <para>Useful information about what each process was doing, at
|         the time the deadlock occured, can be listed with:</para>
|
|       <itemizedlist>
|         <listitem><para><command>where <replaceable>PID</replaceable></command></para></listitem>
|       </itemizedlist>
|
|       <para>The output of the <command>where</command> command tends
|         to be very useful for the processes listed in the output of
|         the <command>show</command> commands.</para>
|
|       <para>To obtain meaningful backtraces for threaded processes,
|         use <command>thread thread-id</command> first, to switch to
|         the correct thread, and then get a backtrace
|         with <command>where</command>.</para>

Does this version look ok to you?  I can handle the merging of this
change with your initial diff/patch

> +    <para>If possible, consider doing further investigation. Receipt
> +      below is especially usefull if you suspect deadlock occurs in the
> +      VFS layer. Add the options
> +      <programlisting>makeoptions		DEBUG=-g
> +	options		INVARIANTS
> +	options		INVARIANT_SUPPORT
> +	options		WITNESS
> +	options		DEBUG_LOCKS
> +	options		DEBUG_VFS_LOCKS
> +	options		DIAGNOSTIC</programlisting>
> +
> +      to the kernel config. When deadlock occurs, in addition to the
> +      output of the <command>ps</command> command, provide information
> +      from the <command>show allpcpu</command>, <command>show
> +      alllocks</command> and <command>show
> +      lockedvnods</command>. More, please provide output of the
> +      <command>where pid</command> for each process id mentioned in
> +      the output of the <command>show</command> commands.
> +    </para>
> +
> +    <para>For threaded processes, to obtain meaningful backtraces, use
> +      <command>thread thread-id</command> to switch to the thread
> +      stack, and do backtrace with <command>where</command>.</para>
> +  </sect1>
>  </chapter>

This part is also nice, but IMHO it would be even nicer if we could
expand it a bit more.  How about something like this?

|       <!-- On reproducing a deadlock and `doing further investigation' -->
|
|       <para>Deadlocks are pretty nasty bugs, since they are not very
|         easy to reproduce.  Their occurence depends on specific
|         timing, synchronization, system load and many more factors.
|         This makes it hard to reliably reproduce a deadlock bug.
|         Since reproducing a bug is some times a crucial part of
|         gathering all the necessary information, you may have to spend
|         some time investigating the deadlock.  Naturally, this is not
|         always possible for production systems, but if you can
|         reproduce the deadlock on a test system which can afford
|         staying off-line for extended periods of time, then consider
|         staying inside <application>DDB</application> while you are
|         investigating the deadlock further.</para>
|
|       <para>A serial console can be extremely helpful in collecting
|         <application>DDB</application> output.</para>
|
|       <para>If it's impossible to set up a serial console
|         (i.e. because you cannot find or afford a second system to
|         configure as a testbed), emulators like
|         <filename role="port">emulators/qemu</filename>,
|         <filename role="port">emulators/vmware2</filename> or
|         <filename role="port">emulators/bochs</filename> may prove a
|         very efficient way of debugging kernel issues, like a
|         deadlock.</para>

Part #2 ...

|       <!-- On kernel options that are useful for debugging locking problems. -->
|
|       <para>Apart from the usual kernel options that are useful for
|         debugging kernel problems, there are some options that are
|         prticularly useful and targetted at debugging locking
|         problems.  These options are:</para>
|
|       <programlisting>        options         INVARIANTS
|       options         INVARIANT_SUPPORT
|       options         WITNESS
|       options         DEBUG_LOCKS
|       options         DEBUG_VFS_LOCKS
|       options         DIAGNOSTIC</programlisting>

Any help in expanding these parts (especially the second one) is more
than welcome :-)




More information about the freebsd-doc mailing list