nvme controller reset failures on recent -CURRENT
Date: Tue, 13 Feb 2024 00:28:10 UTC
I just upgraded my package build machine to: FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e from: FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38 and I've had two nvme-triggered panics in the last day. nvme is being used for swap and L2ARC. I'm not able to get a crash dump, probably because the nvme device has gone away and I get an error about not having a dump device. It looks like a low-memory panic because free memory is low and zfs is calling malloc(). This shows up in the log leading up to the panic: Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a nd possible hot unplug. Feb 12 10:07:41 zipper syslogd: last message repeated 1 times Feb 12 10:07:41 zipper kernel: nvme0: resetting controller Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a nd possible hot unplug. Feb 12 10:07:41 zipper syslogd: last message repeated 1 times Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete Feb 12 10:07:41 zipper syslogd: last message repeated 2 times Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping watchdog ti meout. The device looks healthy to me: SMART/Health Information Log ============================ Critical Warning State: 0x00 Available spare: 0 Temperature: 0 Device reliability: 0 Read only: 0 Volatile memory backup: 0 Temperature: 312 K, 38.85 C, 101.93 F Available spare: 100 Available spare threshold: 10 Percentage used: 3 Data units (512,000 byte) read: 5761183 Data units written: 29911502 Host read commands: 471921188 Host write commands: 605394753 Controller busy time (minutes): 32359 Power cycles: 110 Power on hours: 19297 Unsafe shutdowns: 14 Media errors: 0 No. error info log entries: 0 Warning Temp Composite Time: 0 Error Temp Composite Time: 0 Temperature 1 Transition Count: 5231 Temperature 2 Transition Count: 0 Total Time For Temperature 1: 41213 Total Time For Temperature 2: 0