cvs commit: src/sys/ddb db_command.c db_output.c

Mon Oct 3 18:32:02 PDT 2005

On Mon, 3 Oct 2005, Robert Watson wrote:

> On Mon, 3 Oct 2005, Nate Lawson wrote:
>
>>> Is there any chance I can interest you in an idea phk, I, and a few others 
>>> have been kicking around for a bit relating to smart small dumps? 
>>> Specifically, we were discussing the idea of allowing a dumping mode in 
>>> which rather than dumping all of kernel memory, we dump specifically the 
>>> common and useful output from ddb, such as ps, show locked vnods, show 
>>> alllocks, traceall, show allpcpu, and so on, basically in text format, to 
>>> the dump partition.  ...
>> 
>> That's fine as a hack-around, but I hope that doesn't distract effort from 
>> sparse kernel dumps.  If you throw out non-anonymous pages, buffer cache, 
>> etc., you end up with a very small image to begin with.  Add in gzip 
>> compression and it wouldn't be much larger than your uncompressed logs. 
>> Then you can run whatever info tools you want against the core since no 
>> actual data is lost.

Except the buffer cache and all VMIO pages should be dumped so that
fsck(8) can sync the buffer cache.  It is common for most pages to be
inactive ones with VMIO data in them, so you would often end up with
a very large image.  Dumping only dirty buffers would take less space,
but determining dirty buffers at panic time might trip over locks just
like sync on panic, and fsck might want to check the whole buffer cache
for integrity.

> Actually, there's an important feature difference here is that you get 
> something that is potentially much more useful/persistent in the long run. To 
> make use of a kernel core dump, you need a gdb version that understands the 
> target architecture, a compiled kernel that matches the core, source that 
> matches the kernel, etc.  In absense of these things, the core is a pile of 
> bytes that, with a high level of effort, you can try to extract data from -- 
> especially if it was built months or years ago.  The idea behind these mini 
> dumps is that because you're in the kernel run-time environment and the dump 
> generating code is compiled with the kernel that crashed, it can actually do 
> some of the interpretation and data extraction up front, providing many of 
> the details needed for a bug report without requiring use of gdb, kernel 
> compiles, source trees, and so on.  Sure, you'll need those things if you 
> want to get back to line numbers, so it's not a substitute for a full dump, 
> but it would be quite handy.  I debug very few problems using a full dump, in 
> part because they are so inconvenient to deal with -- it's hardly ever the 
> size, it's the complexity of trying to get architectures, source versions, 
> kernel configurations, etc, aligned.

I first thought that I prefered dumping full data so that anything can be
looked at later.  You rarely know what to look for before a panic occurs.
Then I remembered that I really hate (and mostly avoid using) loader(8)
since it gives similar complications for initialization.  Initialization
and finalization belong in the kernel where they can be much simpler and
more robust since they don't need to pass parameters across stages and
maintain all versions of the parameter passing forever.

I turned off kernel dumps a few years ago to avoid bugs from interface churn
and haven't missed them.  I prefer debugging live kernels anyway.

Bruce