[patch] deadlock debugging
Giorgos Keramidas
keramida at freebsd.org
Fri Jun 23 13:26:25 UTC 2006
On 2006-06-07 11:43, Kostik Belousov <kostikbel at gmail.com> wrote:
> Reports of the deadlocks are reccurrent topic on the current-
> and stable- lists. Many of us have to repeat the instructions
> on how to provide the useful initial bug report from them.
>
> Please, comment proposed addition to the kernel debugging
> chapter of the developer handbook.
Hi Kostik,
> Obviously, I am not an english native speaker. Your corrections
> for both factual material and grammar/style are very much
> welcome !
>
> P.S. I'm not on the list, do not remove CC: to me on replying.
Ok :)
This seems like a useful addition to the developer's handbook,
but I have some minor comments. See inline text below:
> Index: en_US.ISO8859-1/books/developers-handbook/kerneldebug/chapter.sgml
> ===================================================================
> RCS file: /usr/local/arch/ncvs/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug/chapter.sgml,v
> retrieving revision 1.64
> diff -u -r1.64 chapter.sgml
> --- en_US.ISO8859-1/books/developers-handbook/kerneldebug/chapter.sgml 5 Jan 2006 20:03:34 -0000 1.64
> +++ en_US.ISO8859-1/books/developers-handbook/kerneldebug/chapter.sgml 7 Jun 2006 08:39:20 -0000
> @@ -821,6 +821,41 @@
> on any configured console driver, including a serial
> console.</para>
> </sect1>
> +
> + <sect1 id="kerneldebug-deadlocks">
> + <title>Debugging the Deadlocks</title>
`Debugging Kernel Deadlocks' is probably a better title here, since
deadlocks can only occur in the kernel and `the Deadlocks' doesn't
really make this as obvious as I'd probably want it to be.
> + <para>You may experience so called deadlocks, the situation where
> + system stops doing useful work. To provide the useful bug report
> + in this situation, you shall use ddb as described above. Please,
> + include the output of <command>ps</command> and
> + <command>trace</command> for suspected processes in the
> + report.</para>
This paragraph has a few minor syntax buglets. English is not my native
language, but I would probably rewrite this as:
| <para>Modern &os; releases have been extended with support for
| Symmetric Multiprocessing (SMP). To support highly parallel
| processing, the &os; kernel uses a lot of internal locking and
| synchronization primitives, to allow multiple kernel threads
| to run concurrently on systems that can support such a mode of
| operation. Bugs in the use of these internal locking
| mechanisms can lead to a situation where one or more kernel
| threads block compete for the same resources and block
| indefinitely waiting for each other. When this happens, the
| system may become unstable, leading either to a crash or
| appear to <quote>hang</quote>. This hang is called a
| <quote>deadlock</quote>.</para>
|
| <para>Debugging a deadlock may be a tricky and difficult thing,
| but &os; provides some tools that may assist you in tracking
| down the problem or collect information about the deadlock
| when it occurs.</para>
|
| <para>One of these tools is the kernel debugger,
| <application>DDB</application>, which you can use as described
| in the previous sections to collect useful information for
| such a bug. <application>DDB</application> commands that are
| very useful and may provide information that helps debugging a
| deadlock are:</para>
|
| <itemizedlist>
| <listitem><para><command>ps</command></para></listitem>
| <listitem><para><command>trace</command></para></listitem>
| </itemizedlist>
|
| <para>Use the <command>ps</command> command to list all the
| processes and then use <command>trace</command> on processes
| that are suspects for having caused the deadlock.</para>
|
| <para>Other commands that can provide useful information for
| tracking down the cause of a deadlock are:</para>
|
| <itemizedlist>
| <listitem><para><command>show allcpu</command></para></listitem>
| <listitem><para><command>show alllocks</command></para></listitem>
| <listitem><para><command>show lockedvnods</command></para></listitem>
| </itemizedlist>
|
| <para>Useful information about what each process was doing, at
| the time the deadlock occured, can be listed with:</para>
|
| <itemizedlist>
| <listitem><para><command>where <replaceable>PID</replaceable></command></para></listitem>
| </itemizedlist>
|
| <para>The output of the <command>where</command> command tends
| to be very useful for the processes listed in the output of
| the <command>show</command> commands.</para>
|
| <para>To obtain meaningful backtraces for threaded processes,
| use <command>thread thread-id</command> first, to switch to
| the correct thread, and then get a backtrace
| with <command>where</command>.</para>
Does this version look ok to you? I can handle the merging of this
change with your initial diff/patch
> + <para>If possible, consider doing further investigation. Receipt
> + below is especially usefull if you suspect deadlock occurs in the
> + VFS layer. Add the options
> + <programlisting>makeoptions DEBUG=-g
> + options INVARIANTS
> + options INVARIANT_SUPPORT
> + options WITNESS
> + options DEBUG_LOCKS
> + options DEBUG_VFS_LOCKS
> + options DIAGNOSTIC</programlisting>
> +
> + to the kernel config. When deadlock occurs, in addition to the
> + output of the <command>ps</command> command, provide information
> + from the <command>show allpcpu</command>, <command>show
> + alllocks</command> and <command>show
> + lockedvnods</command>. More, please provide output of the
> + <command>where pid</command> for each process id mentioned in
> + the output of the <command>show</command> commands.
> + </para>
> +
> + <para>For threaded processes, to obtain meaningful backtraces, use
> + <command>thread thread-id</command> to switch to the thread
> + stack, and do backtrace with <command>where</command>.</para>
> + </sect1>
> </chapter>
This part is also nice, but IMHO it would be even nicer if we could
expand it a bit more. How about something like this?
| <!-- On reproducing a deadlock and `doing further investigation' -->
|
| <para>Deadlocks are pretty nasty bugs, since they are not very
| easy to reproduce. Their occurence depends on specific
| timing, synchronization, system load and many more factors.
| This makes it hard to reliably reproduce a deadlock bug.
| Since reproducing a bug is some times a crucial part of
| gathering all the necessary information, you may have to spend
| some time investigating the deadlock. Naturally, this is not
| always possible for production systems, but if you can
| reproduce the deadlock on a test system which can afford
| staying off-line for extended periods of time, then consider
| staying inside <application>DDB</application> while you are
| investigating the deadlock further.</para>
|
| <para>A serial console can be extremely helpful in collecting
| <application>DDB</application> output.</para>
|
| <para>If it's impossible to set up a serial console
| (i.e. because you cannot find or afford a second system to
| configure as a testbed), emulators like
| <filename role="port">emulators/qemu</filename>,
| <filename role="port">emulators/vmware2</filename> or
| <filename role="port">emulators/bochs</filename> may prove a
| very efficient way of debugging kernel issues, like a
| deadlock.</para>
Part #2 ...
| <!-- On kernel options that are useful for debugging locking problems. -->
|
| <para>Apart from the usual kernel options that are useful for
| debugging kernel problems, there are some options that are
| prticularly useful and targetted at debugging locking
| problems. These options are:</para>
|
| <programlisting> options INVARIANTS
| options INVARIANT_SUPPORT
| options WITNESS
| options DEBUG_LOCKS
| options DEBUG_VFS_LOCKS
| options DIAGNOSTIC</programlisting>
Any help in expanding these parts (especially the second one) is more
than welcome :-)
More information about the freebsd-doc
mailing list