Re: Handling panics inside vt(4) callbacks
Date: Thu, 13 Apr 2023 15:18:51 UTC
On Wed, Apr 12, 2023 at 10:45:27PM +0200, Jean-Sébastien Pédron wrote: > Hi! > > While working on the DRM drivers, I don't always get a kernel core dump > in case of a panic. > > My hypothesis is that if the DRM driver code called by vt(4) panics, > then the panic code might not go through successfully. The reason is > because panic(9) prints the reason, a stacktrace and possibly some > progress to the console, which calls vt(4) and the DRM driver code again. > > I played with the following patch: > https://gist.github.com/dumbbell/88d77789bfeb38869268c84c40575f49 > > The idea is that before calling "vt_flush()" in "vtterm_done()", I set a > global flag to true to indicate that vt(4) is called as part of kdb or a > panic. If another panic occurs inside vt_flush(), typically the > underlying DRM driver code, "vtterm_done()" is called recursively and > "vt_flush()" might trigger the same panic again. If the flag is set, the > entire function is skipped instead. > > I test the patch by adding a panic(9) just before "vt_flush()" and I > trigger the initial panic with debug.kdb.panic=1. I don't even load a > DRM driver. My problem is that in this case, the laptop reboots > immediately. However, if I replace panic(9) with a simple printf(9), it > works as expected and I get a kernel dump. > > I could not find something in panic(9) code that would reboot the > computer in case of a nested panic. In the case of a nested panic, vpanic() will not set RB_DUMP when it calls kern_reboot(), so it won't write a kernel dump. And, if debug.debugger_on_recursive_panic is not set, the kernel will not try to re-enter the debugger. So the kernel will simply reboot. > Previous versions of the patch called doadump() and rebooted the > computer explicitly if the flag was set, but it didn't work either and I > thought I could simplify that patch and let panic(9) handle recursion. > In other words, I just want to skip most of vt(4) code if vt(4) or DRM > crash. Perhaps we should set RB_DUMP in the case of a recursive panic so long as dumping == 0, i.e., we did not panic again while trying to dump core. In fact, kern_reboot() already checks this. > Does someone spot something wrong in my hypothesis or methodology? > > -- > Jean-Sébastien Pédron > The FreeBSD Project >