maintainer-feedback requested: [Bug 286034] x11/nvidia-driver problems

From: <bugzilla-noreply_at_freebsd.org>
Date: Fri, 11 Apr 2025 08:26:55 UTC
Bugzilla Automation <bugzilla@FreeBSD.org> has asked freebsd-x11 (Nobody)
<x11@FreeBSD.org> for maintainer-feedback:
Bug 286034: x11/nvidia-driver problems
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=286034



--- Description ---
Using FreeBSD 14.2-RELEASE, upgrading to nvidia-driver-570.124.04.1401000
brought some issues. 

First, I am using xfce4. Since the upgrade of nvidia-driver to the version
mentioned above, xfwm4 keeps consuming memory and not releasing it until at
some point it uses around 40GB of RES memory out of my 64. This issue seems to
be known with nvidia, see
https://forums.developer.nvidia.com/t/extreme-growing-memory-usage-in-x11-openg
l-or-vulkan-applications-after-suspend-resume/329078
- also for reference the issue of xfce:
https://gitlab.xfce.org/xfce/xfwm4/-/issues/825

Also, at some point my NVIDIA GeForce GTX 1650 stops working - the desktop
freezes. It occurs rarely, like once a week and leaves the following in the
dmesg:

NVRM: GPU at PCI:0000:0a:00: GPU-eba90a43-57cc-af7c-1928-bf26dbe69c93
NVRM: Xid (PCI:0000:0a:00): 62, 000120ab 00012107 00011c38 00015afb 00015f06
00013f17 00000011 00000000
NVRM: Xid (PCI:0000:0a:00): 119, Timeout after 6s of waiting for RPC response
from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x20800a56 0x5c).
NVRM: GPU0 GSP RPC buffer contains function 76 (GSP_RM_CONTROL) and data
0x0000000020800a56 0x000000000000005c.
NVRM: GPU0 RPC history (CPU -> GSP):
NVRM:	  entry function		   data0	      data1	       
 ts_start	    ts_end	       duration actively_polling
NVRM:	   0	76   GSP_RM_CONTROL	   0x0000000020800a56
0x000000000000005c 0x0006325f5ea56d56 0x0000000000000000	  y
NVRM:	  -1	76   GSP_RM_CONTROL	   0x00000000c3700104
0x0000000000000014 0x0006325f5e685c87 0x0006325f5e68606f   1000us  
NVRM:	  -2	76   GSP_RM_CONTROL	   0x00000000c3700104
0x0000000000000014 0x0006325f5e68589f 0x0006325f5e685c87   1000us  
NVRM:	  -3	76   GSP_RM_CONTROL	   0x00000000c3700104
0x0000000000000014 0x0006325f5e684516 0x0006325f5e684ce7   2001us  
NVRM:	  -4	76   GSP_RM_CONTROL	   0x00000000c3700104
0x0000000000000014 0x0006325f5e681e07 0x0006325f5e681e07	   
NVRM:	  -5	76   GSP_RM_CONTROL	   0x00000000c3700104
0x0000000000000014 0x0006325f5e680697 0x0006325f5e680a7f   1000us  
NVRM:	  -6	76   GSP_RM_CONTROL	   0x00000000c3700104
0x0000000000000014 0x0006325f5e67e757 0x0006325f5e67e757	   
NVRM:	  -7	76   GSP_RM_CONTROL	   0x00000000c3700104
0x0000000000000014 0x0006325f5e67d7b7 0x0006325f5e67db9f   1000us  
NVRM: GPU0 RPC event history (CPU <- GSP):
NVRM:	  entry function		   data0	      data1	       
 ts_start	    ts_end	       duration during_incomplete_rpc
NVRM:	   0	4130 RECOVERY_ACTION	   0x0000000000000000
0x0000000000000000 0x0006325f5ea56d56 0x0006325f5ea56d56	  y
NVRM:	  -1	4102 OS_ERROR_LOG	   0x0000000000000000
0x0000000000000000 0x0006325f5ea56d56 0x0006325f5ea56d56	  y
NVRM:	  -2	4128 GSP_POST_NOCAT_RECORD 0x0000000000000003
0x00000000000120ab 0x0006325f5ea56d56 0x0006325f5ea56d56	  y
NVRM:	  -3	4128 GSP_POST_NOCAT_RECORD 0x0000000000000005
0x00000285057cb854 0x0006325f22f7deeb 0x0006325f22f7deeb	   
NVRM:	  -4	4128 GSP_POST_NOCAT_RECORD 0x0000000000000002
0x0000000000000025 0x0006325ee4d1cabb 0x0006325ee4d1cabb	   
NVRM:	  -5	4099 POST_EVENT 	   0x0000000000000001
0x0000000000000000 0x0006325ee4d1c6d3 0x0006325ee4d1c6d3	   
NVRM:	  -6	4128 GSP_POST_NOCAT_RECORD 0x0000000000000005
0x00000285057cb854 0x0006325ee44b83bc 0x0006325ee44b83bc	   
NVRM:	  -7	4128 GSP_POST_NOCAT_RECORD 0x0000000000000005
0x00000285057cb854 0x0006325ee44ac06b 0x0006325ee44ac06b	   
#0 0xffffffff847a9d38 at os_dump_stack+0x18
#1 0xffffffff840bdc68 at _nv013200rm+0x508
NVRM: Xid (PCI:0000:0a:00): 154, GPU recovery action changed from 0x0 (None) to
0x1 (GPU Reset Required)
NVRM: Xid (PCI:0000:0a:00): 119, pid=6098, name=thunderbird, Timeout after 6s
of waiting for RPC response from GPU0 GSP! Expected function 76
(GSP_RM_CONTROL) (0x20800a6a 0x0).
NVRM: Xid (PCI:0000:0a:00): 119, pid=70894, name=xfwm4, Timeout after 6s of
waiting for RPC response from GPU0 GSP! Expected function 10 (FREE) (0xbeef0403
0x0).
NVRM: Rate limiting GSP RPC error prints for GPU at PCI:0000:0a:00 (printing 1
of every 30).  The GPU likely needs to be reset.
NVRM: Xid (PCI:0000:0a:00): 16, Head 00000003 Count 006c8dcf