Re: -stable from today dumps core with drm-510-kmod and some graphical clients

From: Ulrich_Spörlein <uqs_at_freebsd.org>
Date: Tue, 11 Apr 2023 14:17:13 UTC
On Thu, Mar 30, 2023 at 3:29 PM Mathias Picker <
Mathias.Picker@virtual-earth.de> wrote:

>
> Cy Schubert <Cy.Schubert@cschubert.com> writes:
>
> > On Mon, 27 Mar 2023 23:43:35 +0200
> > Mathias Picker <Mathias.Picker@virtual-earth.de> wrote:
> >
> >> Am 27. März 2023 23:05:35 MESZ schrieb Cy Schubert
> >> <Cy.Schubert@cschubert.com>:
> >> >In message
> >> ><8b47d0a4-a8f1-1841-ee59-3949fe69cbd7@ShaneWare.Biz>, Shane
> >> >Ambler w
> >> >rites:
> >> >> On 26/3/23 01:37, Mathias Picker wrote:
> >> >> >
> >> >> > Starting sddm works fine, starting my normal session
> >> >> > crashes or freezes
> >> >> > FreeBSD.
> >> >> >
> >> >> > I can find no error messages after a reboot.
> >> >> >
> >> >> > I found out, that I can start xterm or emacs (exwm)
> >> >> > without problems,
> >> >> > xrandr works with external screen, but once I start
> >> >> > anything more
> >> >> > demanding (I guess demanding of the GPU) everything
> >> >> > freezes or FreeBSD
> >> >> > even reboots.
> >> >> >
> >> >> > “Demanding† means even simple things like
> >> >> > qterminal. I tried firefox an
> >> >> d
> >> >> > blender and then I had it with the reboots and
> >> >> > didn’t try anything else.
> >> >> > xedit works fine :)
> >> >> >
> >> >> > I have nothing in the logs, I have no idea where to look
> >> >> > or how to debug
> >> >> > this.
> >> >> >
> >> >> > Any ideas, tipps, help greatly apreciated.
> >> >>
> >> >>
> >> >> FreeBSD Developers Handbook Chapter 10: Kernel Debugging
> >> >>
> >> >> https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/
> >> >>
> >> >> Running stable, kernel dumps may already be enabled, look in
> >> >> /var/crash
> >> >>
> >> >> By enabling a kernel dump when it panics (dumpdev="AUTO" in
> >> >> rc.conf) the
> >> >> kernel core is saved to swap space, then on reboot gets
> >> >> copied to
> >> >> dumpdir (/var/crash) where you can then use kgdb (from
> >> >> devel/gdb) to get
> >> >> a stack trace to find where the panic happened.
> >> >
> >> >drm-*-kmod probably needs a rebuild. Likely a data structure
> >> >changed. In my
> >> >experience a simple rebuild of the port solves 90% of
> >> >drm-*-kmod crash
> >> >problems.
> >> >
> >> Hi Cy,
> >>
> >> sorry I didn't mention that, but I did rebuild drm-kmod, I
> >> actually do it after every new kernel build, just to be on the
> >> safe side.
> >>
> >> I switched my swap to non-encrypted and will look if I can get
> >> any information from the kernel dump tomorrow.
> >>
> >> Oh, and it's on a Thinkpad X1 Yoga 3rd gen, I just noticed I
> >> didn't mention this.
> >
> > It may be worth trying drm-515-kmod as some MFC that works with
> > 515 and
> > not 510 may have been committed. Linux-KPI commits are the usual
> > suspects.
> >
> > I use drm-515 with 14-CURRENT.
>
> Finally I found the time for a kernel crash dump.
> This is what kgdb says
>
> mathiasp:amd64.amd64/sys/GENERIC% sudo kgdb kernel
> /var/crash/vmcore.2
> GNU gdb (GDB) 13.1 [GDB v13.1 for FreeBSD]
> Copyright (C) 2023 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-portbld-freebsd13.1".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <https://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
>     <http://www.gnu.org/software/gdb/documentation/>.
>
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from kernel...
> Reading symbols from
> /usr/obj/usr/src/amd64.amd64/sys/GENERIC/kernel.debug...
>
> Unread portion of the kernel message buffer:
>
>
> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> 55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
> (offsetof(struct pcpu,
> (kgdb) backtrace
> #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> #1  doadump (textdump=<optimized out>) at
>  /usr/src/sys/kern/kern_shutdown.c:396
> #2  0xffffffff80c07c2a in kern_reboot (howto=260) at
>  /usr/src/sys/kern/kern_shutdown.c:484
> #3  0xffffffff80c080ce in vpanic (fmt=<optimized out>,
>  ap=ap@entry=0xfffffe01341fab50) at
>  /usr/src/sys/kern/kern_shutdown.c:923
> #4  0xffffffff80c07f03 in panic (fmt=<unavailable>) at
>  /usr/src/sys/kern/kern_shutdown.c:847
> #5  0xffffffff810c1fa7 in trap_fatal (frame=0xfffffe01341fac40,
>  eva=0) at /usr/src/sys/amd64/amd64/trap.c:942
> #6  0xffffffff810c1fff in trap_pfault (frame=0xfffffe01341fac40,
>  usermode=false, signo=<optimized out>, ucode=<optimized out>)
>     at /usr/src/sys/amd64/amd64/trap.c:761
> #7  <signal handler called>
> #8  0xffffffff84a07067 in shmem_get_pages () from
>  /boot/modules/i915kms.ko
> #9  0x0000000300000015 in ?? ()
> #10 0x0000000000000060 in ?? ()
> #11 0x0000000000000060 in ?? ()
> #12 0x0000000000060000 in ?? ()
> #13 0xfffffe00dc365a80 in ?? ()
> #14 0xfffff00100000060 in ?? ()
> #15 0xfffff8003e270c00 in ?? ()
> #16 0x00000000fffff000 in ?? ()
> #17 0xfffff8002138fc20 in ?? ()
> #18 0xfffffe00dc365a80 in ?? ()
> #19 0x0000000000000060 in ?? ()
> #20 0xfffff8003e270c00 in ?? ()
> #21 0x0000000000000060 in ?? ()
> #22 0xfffffe0131e0fc80 in ?? ()
> #23 0xfffffe01341fade0 in ?? ()
> #24 0xffffffff84a07596 in shmem_pwrite () from
>  /boot/modules/i915kms.ko
> #25 0x0000000000000000 in ?? ()
> (kgdb)
>
>
> Anything else I can do to help?
>
> I’m now building drm-515-kmod, let’s see how that works in
> -stable.
>
> /Mathias
>
>
Any updates here? I just ran into this myself and am very close to just
installing Linux on my laptop, tbh.

I've rebuilt stable/13 today, then rebuilt the 510-kmod (because the
515-kmod doesn't even build) and pretty much anything that's not an XTerm
will panic/reboot the machine (a Thinkpad T490 with Intel GPU).

dmesg got this to say:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff84430626
stack pointer           = 0x28:0xfffffe0140c83cf0
frame pointer           = 0x28:0xfffffe0140c83d70
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (i915-userptr-acquir)
trap number             = 12
panic: page fault
cpuid = 1
time = 1681221523
KDB: stack backtrace:
#0 0xffffffff80c5fc15 at kdb_backtrace+0x65
#1 0xffffffff80c12e02 at vpanic+0x152
#2 0xffffffff80c12ca3 at panic+0x43
#3 0xffffffff810d1577 at trap_fatal+0x387
#4 0xffffffff810d15cf at trap_pfault+0x4f
#5 0xffffffff810a8568 at calltrap+0x8
#6 0xffffffff84430c02 at __i915_gem_userptr_get_pages_worker+0x1f2
#7 0xffffffff80e80883 at linux_work_fn+0xe3
#8 0xffffffff80c746f1 at taskqueue_run_locked+0x181
#9 0xffffffff80c759b3 at taskqueue_thread_loop+0xc3
#10 0xffffffff80bcf55d at fork_exit+0x7d
#11 0xffffffff810a95de at fork_trampoline+0xe

It apparently dumps core, will have to reacquaint myself with how to poke
at this some more...