Re: nvme related(?) panic on recent -CURRENT

From: Juraj Lutter <otis_at_FreeBSD.org>
Date: Tue, 04 Jul 2023 15:06:00 UTC

> On 4 Jul 2023, at 17:01, Chuck Tuffli <ctuffli@gmail.com> wrote:
> 
> On Thu, Jun 29, 2023 at 12:47 PM Juraj Lutter <otis@freebsd.org> wrote:
>> 
>> With recent -current, following occured:
>> 
>> db> bt
>> Tracing pid 0 tid 100063 td 0xfffffe00c5c35e40
>> kdb_enter() at kdb_enter+0x32/frame 0xfffffe00c5e31c90
>> vpanic() at vpanic+0x181/frame 0xfffffe00c5e31ce0
>> panic() at panic+0x43/frame 0xfffffe00c5e31d40
>> nvme_ctrlr_identify() at nvme_ctrlr_identify+0x10e/frame 0xfffffe00c5e31d90
>> nvme_ctrlr_start() at nvme_ctrlr_start+0x91/frame 0xfffffe00c5e31e10
>> nvme_ctrlr_reset_task() at nvme_ctrlr_reset_task+0xec/frame 0xfffffe00c5e31e40
>> taskqueue_run_locked() at taskqueue_run_locked+0x182/frame 0xfffffe00c5e31ec0
>> taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe00c5e31ef0
>> fork_exit() at fork_exit+0x7d/frame 0xfffffe00c5e31f30
>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c5e31f30
>> --- trap 0, rip = 0, rsp = 0, rbp = 0 —
>> 
>> machine is a bhyve guest.
> 
> If I'm lldb'ing correctly, nvme_ctrlr_identify+0x10e is the panic in
> nvme_completion_poll() if the NVMe command does not complete within
> the timeout period (10 seconds). In this case, it is the Identify,
> Controller command. In the bhyve emulation, this command effectively
> memcpy's the data structure to the memory provided by the guest and
> completes the command. If this panic is reproducible, I can provide a
> patch to enhance the debug output to figure out if this is an
> emulation or driver issue.

It hasn’t happened since. What I can do is to put heavy load on
that box (poudriere in a jail as well as poudriere in that VM).

That could help to reproduce the panic. As mentioned somewhere
else in this thread, I’ve a `cu’ session running.

otis


—
Juraj Lutter
otis@FreeBSD.org