Re: -stable from today dumps core with drm-510-kmod and some graphical clients
Date: Tue, 11 Apr 2023 14:17:13 UTC
On Thu, Mar 30, 2023 at 3:29 PM Mathias Picker < Mathias.Picker@virtual-earth.de> wrote: > > Cy Schubert <Cy.Schubert@cschubert.com> writes: > > > On Mon, 27 Mar 2023 23:43:35 +0200 > > Mathias Picker <Mathias.Picker@virtual-earth.de> wrote: > > > >> Am 27. März 2023 23:05:35 MESZ schrieb Cy Schubert > >> <Cy.Schubert@cschubert.com>: > >> >In message > >> ><8b47d0a4-a8f1-1841-ee59-3949fe69cbd7@ShaneWare.Biz>, Shane > >> >Ambler w > >> >rites: > >> >> On 26/3/23 01:37, Mathias Picker wrote: > >> >> > > >> >> > Starting sddm works fine, starting my normal session > >> >> > crashes or freezes > >> >> > FreeBSD. > >> >> > > >> >> > I can find no error messages after a reboot. > >> >> > > >> >> > I found out, that I can start xterm or emacs (exwm) > >> >> > without problems, > >> >> > xrandr works with external screen, but once I start > >> >> > anything more > >> >> > demanding (I guess demanding of the GPU) everything > >> >> > freezes or FreeBSD > >> >> > even reboots. > >> >> > > >> >> > “Demanding†means even simple things like > >> >> > qterminal. I tried firefox an > >> >> d > >> >> > blender and then I had it with the reboots and > >> >> > didn’t try anything else. > >> >> > xedit works fine :) > >> >> > > >> >> > I have nothing in the logs, I have no idea where to look > >> >> > or how to debug > >> >> > this. > >> >> > > >> >> > Any ideas, tipps, help greatly apreciated. > >> >> > >> >> > >> >> FreeBSD Developers Handbook Chapter 10: Kernel Debugging > >> >> > >> >> https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/ > >> >> > >> >> Running stable, kernel dumps may already be enabled, look in > >> >> /var/crash > >> >> > >> >> By enabling a kernel dump when it panics (dumpdev="AUTO" in > >> >> rc.conf) the > >> >> kernel core is saved to swap space, then on reboot gets > >> >> copied to > >> >> dumpdir (/var/crash) where you can then use kgdb (from > >> >> devel/gdb) to get > >> >> a stack trace to find where the panic happened. > >> > > >> >drm-*-kmod probably needs a rebuild. Likely a data structure > >> >changed. In my > >> >experience a simple rebuild of the port solves 90% of > >> >drm-*-kmod crash > >> >problems. > >> > > >> Hi Cy, > >> > >> sorry I didn't mention that, but I did rebuild drm-kmod, I > >> actually do it after every new kernel build, just to be on the > >> safe side. > >> > >> I switched my swap to non-encrypted and will look if I can get > >> any information from the kernel dump tomorrow. > >> > >> Oh, and it's on a Thinkpad X1 Yoga 3rd gen, I just noticed I > >> didn't mention this. > > > > It may be worth trying drm-515-kmod as some MFC that works with > > 515 and > > not 510 may have been committed. Linux-KPI commits are the usual > > suspects. > > > > I use drm-515 with 14-CURRENT. > > Finally I found the time for a kernel crash dump. > This is what kgdb says > > mathiasp:amd64.amd64/sys/GENERIC% sudo kgdb kernel > /var/crash/vmcore.2 > GNU gdb (GDB) 13.1 [GDB v13.1 for FreeBSD] > Copyright (C) 2023 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > Type "show copying" and "show warranty" for details. > This GDB was configured as "x86_64-portbld-freebsd13.1". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > <https://www.gnu.org/software/gdb/bugs/>. > Find the GDB manual and other documentation resources online at: > <http://www.gnu.org/software/gdb/documentation/>. > > For help, type "help". > Type "apropos word" to search for commands related to "word"... > Reading symbols from kernel... > Reading symbols from > /usr/obj/usr/src/amd64.amd64/sys/GENERIC/kernel.debug... > > Unread portion of the kernel message buffer: > > > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 > 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" > (offsetof(struct pcpu, > (kgdb) backtrace > #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 > #1 doadump (textdump=<optimized out>) at > /usr/src/sys/kern/kern_shutdown.c:396 > #2 0xffffffff80c07c2a in kern_reboot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:484 > #3 0xffffffff80c080ce in vpanic (fmt=<optimized out>, > ap=ap@entry=0xfffffe01341fab50) at > /usr/src/sys/kern/kern_shutdown.c:923 > #4 0xffffffff80c07f03 in panic (fmt=<unavailable>) at > /usr/src/sys/kern/kern_shutdown.c:847 > #5 0xffffffff810c1fa7 in trap_fatal (frame=0xfffffe01341fac40, > eva=0) at /usr/src/sys/amd64/amd64/trap.c:942 > #6 0xffffffff810c1fff in trap_pfault (frame=0xfffffe01341fac40, > usermode=false, signo=<optimized out>, ucode=<optimized out>) > at /usr/src/sys/amd64/amd64/trap.c:761 > #7 <signal handler called> > #8 0xffffffff84a07067 in shmem_get_pages () from > /boot/modules/i915kms.ko > #9 0x0000000300000015 in ?? () > #10 0x0000000000000060 in ?? () > #11 0x0000000000000060 in ?? () > #12 0x0000000000060000 in ?? () > #13 0xfffffe00dc365a80 in ?? () > #14 0xfffff00100000060 in ?? () > #15 0xfffff8003e270c00 in ?? () > #16 0x00000000fffff000 in ?? () > #17 0xfffff8002138fc20 in ?? () > #18 0xfffffe00dc365a80 in ?? () > #19 0x0000000000000060 in ?? () > #20 0xfffff8003e270c00 in ?? () > #21 0x0000000000000060 in ?? () > #22 0xfffffe0131e0fc80 in ?? () > #23 0xfffffe01341fade0 in ?? () > #24 0xffffffff84a07596 in shmem_pwrite () from > /boot/modules/i915kms.ko > #25 0x0000000000000000 in ?? () > (kgdb) > > > Anything else I can do to help? > > I’m now building drm-515-kmod, let’s see how that works in > -stable. > > /Mathias > > Any updates here? I just ran into this myself and am very close to just installing Linux on my laptop, tbh. I've rebuilt stable/13 today, then rebuilt the 510-kmod (because the 515-kmod doesn't even build) and pretty much anything that's not an XTerm will panic/reboot the machine (a Thinkpad T490 with Intel GPU). dmesg got this to say: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff84430626 stack pointer = 0x28:0xfffffe0140c83cf0 frame pointer = 0x28:0xfffffe0140c83d70 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (i915-userptr-acquir) trap number = 12 panic: page fault cpuid = 1 time = 1681221523 KDB: stack backtrace: #0 0xffffffff80c5fc15 at kdb_backtrace+0x65 #1 0xffffffff80c12e02 at vpanic+0x152 #2 0xffffffff80c12ca3 at panic+0x43 #3 0xffffffff810d1577 at trap_fatal+0x387 #4 0xffffffff810d15cf at trap_pfault+0x4f #5 0xffffffff810a8568 at calltrap+0x8 #6 0xffffffff84430c02 at __i915_gem_userptr_get_pages_worker+0x1f2 #7 0xffffffff80e80883 at linux_work_fn+0xe3 #8 0xffffffff80c746f1 at taskqueue_run_locked+0x181 #9 0xffffffff80c759b3 at taskqueue_thread_loop+0xc3 #10 0xffffffff80bcf55d at fork_exit+0x7d #11 0xffffffff810a95de at fork_trampoline+0xe It apparently dumps core, will have to reacquaint myself with how to poke at this some more...