[Bug 267028] kernel panics when booting with both (zfs,ko or vboxnetflt,ko or acpi_wmi.ko) and amdgpu.ko
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 21 Dec 2024 16:28:27 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267028 --- Comment #260 from Mark Millard <marklmi26-fbsd@yahoo.com> --- (In reply to satanist+freebsd from comment #259) In: mod = malloc(sizeof(struct modlist), M_LINKER, M_NOWAIT | M_ZERO); if (mod == NULL) panic("no memory for module list"); mod->container = container; if something similar to mod == 0xfffff80000000007 resulted, it appears to me that the dereference in mod->container or the like would have gotten a general protection fault, given the later actual failure that sometimes happens because of the 0xfffff80000000007 that sometimes happens. I'll note also that, for example, one of the historical crashes involving 0xfffff80000000007 was in handling a different list: /* * Remove the references to the thread from all of the objects we were * polling. */ static void seltdclear(struct thread *td) { struct seltd *stp; struct selfd *sfp; struct selfd *sfn; stp = td->td_sel; STAILQ_FOREACH_SAFE(sfp, &stp->st_selq, sf_link, sfn) selfdfree(stp, sfp); stp->st_flags = 0; } so the issue does not appear to be list specific, even if one list is more common for failing than others for some reason. I do not know if there is some relevant relationship with the likes of code from: drm-kmod/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c for alternate failure points. No simple reproduction test has ever been discovered. MALLOC_DEBUG is controlled in the kernel via sys/kern/kern_malloc.c having the code: #if defined(INVARIANTS) || defined(MALLOC_MAKE_FAILURES) || \ defined(DEBUG_MEMGUARD) || defined(DEBUG_REDZONE) #define MALLOC_DEBUG 1 #endif It, in turn leads to definition and use of the kernel's malloc_dbg() and free_dbg(). I certainly have no objection to such testing, say via using an INVARIANTS based kernel build. But I'm not testing, having no context to use to reproduce the problem with. I'm just looking at vmcore.* file(s) via kgdb . But I'll also note, that recently we appear to have learned that some of the software in use was rather old and not being updated --so not tracking kernel updates. Testing if the modern software built to match the kernel in use also produces the problems seems appropriate, as that is what would be changed if there is still a bug to be fixed. As I understand that testing is what is going on now. -- You are receiving this mail because: You are the assignee for the bug.