Re: Debugging a (potentially?) ZFS-related panic, and discussion about large patchsets

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Tue, 11 Jan 2022 14:38:40 UTC
On Tue, Jan 11, 2022 at 09:16:46AM -0500, Mark Johnston wrote:
> On Tue, Jan 11, 2022 at 09:28:27AM +0200, Andriy Gapon wrote:
> > On 11/01/2022 01:43, Mateusz Guzik wrote:
> > > imo the kernel should be patched to obtain the trace on its own. As
> > > the target has interrupts disabled it will have to do it with NMI, but
> > > support for that got scrapped in
> > > 
> > > commit 1c29da02798d968eb874b86221333a56393a94c3
> > > Author: Mark Johnston<markj@FreeBSD.org>
> > > Date:   Fri Jan 31 15:43:33 2020 +0000
> > > 
> > >      Reimplement stack capture of running threads on i386 and amd64.
> > 
> > This is an off-topic for the thread, but as far as I recall, even when the stack 
> > capture (e.g., for procstat -k) was implemented using NMI there was a piece of 
> > code in the corresponding NMI handler that skipped the stack tracing if 
> > interrupts were disabled.  I don't recall / know why.
> > You can see that in the removed stack_nmi_handler() that used to be in 
> > sys/x86/x86/stack_machdep.c.
> 
> I think it may have been to avoid tracing threads in the middle of a
> context switch, but I can't remember exactly which inconsistencies were
> problematic.
Thread stack can become unmapped any moment it went off cpu.  You do not
know which place in the context switch code was interrupted.