Re: nvidia_drv.so/Xorg crashes
- In reply to: Craig Leres : "nvidia_drv.so/Xorg crashes"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 25 Jun 2021 05:54:25 UTC
On Fri, Jun 25, 2021 at 4:31 AM Craig Leres <leres@freebsd.org> wrote: > > I have four (12.2-RELEASE) systems between the office at home that are > full or part time FreeBSD desktops. All have pny nvidia quadro 410's. > These have been mostly working well for about 6 years. > > For months I've started seeing screen corruption when using chrome or > kicad; firefox and thunderbird are always ok. But just starting eeschema > always damages the root window a little. And it's common when running > chrome/kicad to see lines in the console xterm window jump up and down > two lines. But for the last week or two Xorg has been crashing: > > [ 74574.029] (EE) Backtrace: > [ 74574.032] (EE) 0: /usr/local/bin/Xorg (?+0x0) [0x41c98a] > [ 74574.033] (EE) unw_get_proc_name failed: no unwind info found [-10] > [ 74574.033] (EE) 1: /lib/libthr.so.3 (?+0x0) [0x800929b7e] > [ 74574.035] (EE) unw_get_proc_name failed: no unwind info found [-10] > [ 74574.035] (EE) 2: /lib/libthr.so.3 (?+0x0) [0x80092913f] > [ 74574.037] (EE) 3: ? (?+0x0) [0x7ffffffff003] > [ 74574.038] (EE) 4: > /usr/local/lib/xorg/modules/drivers/nvidia_drv.so (?+0x0) [0x801cc8c20] > [ 74574.038] (EE) > [ 74574.038] (EE) Segmentation fault at address 0x50 > [ 74574.038] (EE) > Fatal server error: > [ 74574.038] (EE) Caught signal 11 (Segmentation fault). Server > aborting > > The crashes are always preceded by at least one nvidia "Xid" kernel message: > > Jun 23 ... kernel: : NVRM: Xid (PCI:0000:05:00): 69, pid=6327, > Class Error: ChId 0009, Class 0000902d, Offset 000008b4, Data fffffffb, > ErrorCode 00000004 > Jun 23 ... kernel: : NVRM: Xid (PCI:0000:05:00): 69, pid=6327, > Class Error: ChId 0009, Class 0000902d, Offset 000008b4, Data fffffffb, > ErrorCode 00000004 > Jun 23 ... kernel: : NVRM: Xid (PCI:0000:05:00): 69, pid=6327, > Class Error: ChId 0009, Class 0000902d, Offset 000008b4, Data ffffffb9, > ErrorCode 00000004 > Jun 23 ... kernel: : pid 6327 (Xorg), jid 0, uid 0: exited on signal 6 > > Worth noting is that it was not unusual to see many Xid ErrorCode 4 > kernel messages without crashes. (And it's the only ErrorCode I've ever > seen.) > > My first thought was bad nvidia-driver version. But after working my > way, one by one, down to 460.39 (circa February 2021 -- months before > the first crashes) I gave up on that theory. > > My next guess bad hardware but I swapped quadro's between two systems > and the crashes persisted. > > Yesterday Xorg crashed often enough for me to zero on the trigger; it's > the use of tvtwm's f.forcemove action (which is like f.move but allows > moving a windows off the screen) if I move a window slightly off the > bottom of the screen. Here's the .twmrc binding I use: > > Button2 = m s : window : f.forcemove > > The crash doesn't happen 100% of the time but it's pretty easy to > trigger with half a dozen windows open. Just grab a window and randomly > dip part of it past the bottom of the screen. So my new theory is a > frame buffer operation in one of the libraries the path between Xorg and > the nvidia driver has regressed and is asking the nvidia driver to do > something that causes it to do something bad. > > I run a custom version of tvtwm but was able to easily crash Xorg using > x11-wm/twm on a spare quadro 410 workstation; the key is f.forcemove. > > Does anybody know what this issue is? What are likely candidates of > recently changed port libraries that I could try downgrading? Should I > try opening a ticket with nvidia? Should I try even older 460.XX > drivers? What else can I try? (Thanks for reading this far!) Long shot, but libglvnd update affected x11/nvidia-driver. Have a look at UPDATING 20210617 HTH > > Craig >