From nobody Sat Aug 03 01:15:47 2024 X-Original-To: stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WbPsV3jgZz5Rb3X for ; Sat, 03 Aug 2024 01:15:58 +0000 (UTC) (envelope-from junchoon@dec.sakura.ne.jp) Received: from www121.sakura.ne.jp (www121.sakura.ne.jp [153.125.133.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4WbPsT5Qbzz3wmf for ; Sat, 3 Aug 2024 01:15:57 +0000 (UTC) (envelope-from junchoon@dec.sakura.ne.jp) Authentication-Results: mx1.freebsd.org; none Received: from kalamity.joker.local (123-1-21-232.area1b.commufa.jp [123.1.21.232]) (authenticated bits=0) by www121.sakura.ne.jp (8.17.1/8.17.1/[SAKURA-WEB]/20201212) with ESMTPA id 4731FmNi074952; Sat, 3 Aug 2024 10:15:48 +0900 (JST) (envelope-from junchoon@dec.sakura.ne.jp) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=dec.sakura.ne.jp; s=s2405; t=1722647748; bh=0Jun++ya4vcSgzXOzdhsrklDkboldORX2XnYkFoNUA0=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=Pt94xBa3HO5wm1gERuVKaPebijSDIHmHaLoxlV/+LmiYnPXnMQ/ERB3wsRRB6ypge jeeKd/hwYitPv4LHW6b2+Kct+gBtBrkK6H5kONS4zqYCcnrPM5AMrKduMPtUvLSmyg q8QM759wSJvPY31YCenHfSJsEJAibQZ5hGUbtmaI= Date: Sat, 3 Aug 2024 10:15:47 +0900 From: Tomoaki AOKI To: Pontus Bramberg Cc: stable@freebsd.org Subject: Re: Nvidia Xorg page fault in kernel mode 14-STABLE/amd64 Message-Id: <20240803101547.d16196bee2b52c7316120a76@dec.sakura.ne.jp> In-Reply-To: <710b2f4f-6bee-4868-8c63-e2de1ae11802@bramberg.net> References: <802dc8ba-e213-4635-8315-b37784d426e4@bramberg.net> <20240803040735.721d11fa88c0222286eadbc4@dec.sakura.ne.jp> <710b2f4f-6bee-4868-8c63-e2de1ae11802@bramberg.net> Organization: Junchoon corps X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; amd64-portbld-freebsd14.1) List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-stable@freebsd.org Sender: owner-freebsd-stable@FreeBSD.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="Multipart=_Sat__3_Aug_2024_10_15_47_+0900_p2h3jYvlCU0NRRte" X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:7684, ipnet:153.125.128.0/18, country:JP] X-Rspamd-Queue-Id: 4WbPsT5Qbzz3wmf This is a multi-part message in MIME format. --Multipart=_Sat__3_Aug_2024_10_15_47_+0900_p2h3jYvlCU0NRRte Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On Fri, 2 Aug 2024 22:04:37 +0100 Pontus Bramberg wrote: > On 8/2/24 20:07, Tomoaki AOKI wrote: > > On Fri, 2 Aug 2024 17:24:30 +0100 > > Pontus Bramberg wrote: > > > >> From: Pontus Bramberg > >> To: stable@freebsd.org > >> Subject: Nvidia Xorg page fault in kernel mode 14-STABLE/amd64 > >> Date: Fri, 2 Aug 2024 17:24:30 +0100 > >> Sender: owner-freebsd-stable@FreeBSD.org > >> User-Agent: Mozilla Thunderbird > >> > >> Hello, > >> > >> I'm not entirely sure if this is the right place to ask about this. I > >> apologise if not. > >> > >> I recently updated the UEFI BIOS on my laptop. Immediately after this, > >> Xorg would not start when using the Nvidia driver. I attempted to > >> downgrade the BIOS but the laptop would not flash an earlier version of > >> the BIOS. I therefore instead tried to rebuild kernel and world from the > >> latest stable/14 git commit (b37a6d41a046dbb46ee1d6bf00c710c03c944a24) > >> as well as uninstalling and reinstalling x11/nvidia-driver from the > >> latest ports collection (version 550.54.14). This did not help so I > >> rebuilt the same kernel and world after 'make clean' and 'make > >> cleanworld' and reinstalled the same version of x11/nvidia-driver. I at > >> first thought this might be related to the similar issue discussed July > >> 2 to 5 on this mailing list but the workaround from then (rebuilding > >> kernel, world, and driver) does not work for me and the BIOS update make > >> me think this is a different issue. Xorg works perfectly well if I > >> switch to the integrated Intel graphics (using the i915kms module) so I > >> think the problem is related to the discrete GPU. I do not normally use > >> nvidia-drm-kmod but I have tried using both graphics/nvidia-drm-kmod and > >> graphics/nvidia-drm-61-kmod with the same result, except the system > >> crashes on boot rather than when starting Xorg (I use startx if that > >> matters). The laptop is a Lenovo Thinkpad P16 with an Nvidia RTX 3500 > >> Ada Generation Laptop GPU if that is helpful. If there are any logs or > >> anything else that would be useful, please let me know. I would be very > >> grateful if anybody knows how to resolve this or has any pointers for > >> further troubleshooting. > >> > >> The output before the system crashes: > >> > >> ACPI Warning: \134_SB.PC00.PEG1.PEGP._DSM: Argument #4 type mismatch - > >> Found [Buffer], ACPI requires [Package] (20221020/nsarguments-212) > >> NVRM: GPU at PCI:0000:01:00: GPU-58d85fdb-6f45-87c1-fe0f-9a26e92647c9 > >> NVRM: Xid (PCI:0000:01:00): 62, pid='', name=, 2022a7a6 2028a6fc > >> 2027a696 2027a1b2 20250cf2 2025084c 00000000 00000000 > >> > >> > >> Fatal trap 12: page fault while in kernel mode > >> cpuid = 18; apic id = 44 > >> fault virtual address = 0x0 > >> fault code = supervisor read data, page not present > >> instruction pointer = 0x20:0xffffffff861c1354 > >> stack pointer = 0x28:0xfffffe02ebc316b0 > >> frame pointer = 0x28:0xfffffe02eff26e10 > >> code segment = base 0x0, limit 0xfffff, type 0x1b > >>             = DPL 0, pres 1, long 1, def32 0, gran 1 > >> processor eflags = interrupt enabled, resume, IOPL = 0 > >> current process = 2380 (Xorg) > >> rdi: 0000000000000000 rsi: 0000000000000040 rdx: 0000000000000007 > >> rcx: 0000000000000007 r8: 00000000000000c0 r9: 0000000000000066 > >> rax: 0000000000000000 rbx: fffffe02f051c000 rbp: fffffe02eff26e10 > >> r10: 00000000100d96f8 r11: 0000000066ace8f2 r12: fffffe02eff2d000 > >> r13: 0000000000000000 r14: 0000000000000055 r15: 0000000000000000 > >> trap number     = 12 > >> panic: page fault > >> cpuid = 18 > >> time = 1722607858 > >> KDB: stack backtrace: > >> #0 0xffffffff80b86d7d at kdb_backtrace+0x5d > >> #1 0xffffffff80b399a1 at vpanic+0x131 > >> #2 0xffffffff80b39863 at panic+0x43 > >> #3 0xffffffff8101a93b at trap_fatal+0x40b > >> #4 0xffffffff8101a986 at trap_pfault+0x46 > >> #5 0xffffffff80ff0c98 at calltrap+0x8 > >> Uptime: 39s > >> Automatic reboot in 15 seconds - press a key on the console to abort > >> --> Press a key on the console to reboot, > >> --> or switch off the system now. > >> > >> Best wishes, > >> Pontus Bramberg > > > > How do you load nvidia related modules? > > If you're loading them via /boot/loader.conf[.local], please don't. > > You can load them via kldlist variable in /etc/rc.conf[.local]. > > > > This usually causes problem on module loading. > > And trap 12 on boot makes me suspect the truncated loading of modules. > > (These truncations cause many types of crashes, though.) > > > > See, for example, Bug277827 [1], Bug277364 [2] and Bug277028. > > > > One more to mention, assuming you're building x11/nvidia-driver and > > graphics/nvidia-drm-61-kmod from ports, for latest stable/14, it's > > strange if you could build graphics/nvidia-drm-61-kmod (or > > graphics/nvidia-drm-515-kmod) on vanilla ports tree. The patch proposed > > (by me) on Bug279539 [4] should be needed for successful build. > > > > And does anything default (you have't modified) options ON UEFI > > FIRMWARE changed, according to the firmware release notes? > > Lenovo usually provides relatively precise per-revision informations in > > it, at least for ThinPad P and T series. > > > > If anything changed between your previous and current firmware, trying > > to restore the changed defaults to previous default could help. > > > > [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277827 > > > > [2] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277364 > > > > [3] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277028 > > > > [4] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=279539 > > > > Thank you very much for this. > I load the relevant module (nvidia-modeset or nvidia-drm) by adding it > to kldlist in /etc/rc.conf. When using nvidia-drm, I have added only > hw.nvidiadrm.modeset=1 to /boot/loader.conf. Using nvidia-modeset, there > are no graphics related items in /boot/loader.conf. So it should be fine. > I install all my software using the ports collection. > I was using the patch you proposed on Bug279539 when building > graphics/nvidia-drm-61-kmod. I apologise for not mentioning that. I just > tested this again to make sure I remember correctly and indeed > graphics/nvidia-drm-61-kmod does not build with the vanilla ports tree > but it does with the patch. With nvidia-drm (built using the patch) in > kldlist, the system crashes on boot rather than when attempting to start > X and I have to boot into single-user mode to remove it from /etc/rc.conf. If you want, you can try attached patch to upgrade x11/nvidia-driver and graphics/nvidia-drm-61-kmod to latest production branch of the driver. The patch includes preliminary support to allow latest beta branch of the driver (560 series), but the support would need more investivation before proposing at Bugzilla. I'm neither a nividia insider nor maintainer, so I don't know the reasons why some files are already ignored and actually what files are needed to be added and the intentions of the addition. Currently what I know is that from 560 series of driver, 2 firmwares are added (release highlights and readme just says that some firmware modules are added for supported GPUs by the firmwares) and my GPU (Quadro T1000 in my ThinkPad P52, Pascal generation) doesn't seem to be supported by the 2 firmware kmods (as neither of them are autoloaded). But at least the current patch allows to build/install 560 series of the driver (x11/nvidia-driver, x11/linux-nvidia-libs and graphics/nvidia-drm-61-kmod) by overriding version and disabling checksum confirmation as usual. Note that for graphics/nvidia-frm-*-kmod drivers, x11/nvidia-driver/Makefile.version must be edited to override the version. Neither Austin and I could determine why this happenes. And for graphics/nvidia-drm-*-kmod, you need the patch at Differential Revision D45400 on Phablicator [5] to build 555 series and later. And more, attached patch doesn't include patches on D45400 and Bug279539, as they should be committed to the tree from their place. > After updating the BIOS, I restored factory defaults. The changes I have > made are disabling the trackpad and setting F1-F12 (instead of media > functions) to be the default mode for the top row of keys. I have tried > with the graphics card set to discrete only (my preference) and hybrid > graphics (I am currently using this as it allows me to run X using the > integrated GPU without a problem). In both cases, attempting to start > Xorg using the Nvidia card either crashes the system with the same error > (instruction pointer and stack backtrace also seem to be the same) or > does not find the graphics card (in hybrid mode I have to set BusID for > the Nvidia card to be found by Xorg but that also does not work). I have > tried adding (separately, with reboots in between) both nvidia-drm and > nvidia-modeset to kldlist with this configuration and the results are > the same except that nvidia-modeset does not crash until I try to start > X while nvidia-drm crashes on boot. Assuming you did the same on previous (successful for nvidia-driver) firmware upgrade, I suspect changes of firmware factory default is somehow changed from the previous upgrade and causing this issue. Why I introduced my not enough tested/investigated patch is that there is a possibility some fixes are introduced (silently) at some point by nvidia for the issue you're encountering. [5] https://reviews.freebsd.org/D45400 -- Tomoaki AOKI --Multipart=_Sat__3_Aug_2024_10_15_47_+0900_p2h3jYvlCU0NRRte Content-Type: text/x-diff; name="patch-nvidia-550.107.02.diff" Content-Disposition: attachment; filename="patch-nvidia-550.107.02.diff" Content-Transfer-Encoding: 7bit diff -u -p -N x11/nvidia-driver/Makefile.version.orig x11/nvidia-driver/Makefile.version --- a/x11/nvidia-driver/Makefile 2023-12-05 00:46:50.253711000 +0900 +++ b/x11/nvidia-driver/Makefile 2024-08-02 22:25:50.898249000 +0900 @@ -372,6 +372,10 @@ .if ${NVVERSION} < 545.000 ${REINPLACE_CMD} -e '/libnvidia-gpucomp\.so/d' ${TMPPLIST} .endif +.if ${NVVERSION} < 560.02803 + ${REINPLACE_CMD} -e '/nvidia-gsp_ga10x_fw\.ko/d' ${TMPPLIST} + ${REINPLACE_CMD} -e '/nvidia-gsp_tu10x_fw\.ko/d' ${TMPPLIST} +.endif .if ${NVVERSION} < 410.057 # Rename some libraries and install a libmap file to resolve conflict with diff -u -p -N x11/nvidia-driver/Makefile.version.orig x11/nvidia-driver/Makefile.version --- a/x11/nvidia-driver/Makefile.version 2023-12-05 00:46:50.253711000 +0900 +++ b/x11/nvidia-driver/Makefile.version 2024-08-02 22:25:50.898249000 +0900 @@ -1,4 +1,4 @@ # NVIDIA Distversion # # This will be included from x11/nvidia-driver and the nvidia-drm port -NVIDIA_DISTVERSION = 550.54.14 +NVIDIA_DISTVERSION = 550.107.02 diff -u -p -N x11/nvidia-driver/distinfo.orig x11/nvidia-driver/distinfo --- a/x11/nvidia-driver/distinfo 2023-12-05 00:46:50.253711000 +0900 +++ b/x11/nvidia-driver/distinfo 2024-08-02 22:25:50.898249000 +0900 @@ -1,6 +1,6 @@ -TIMESTAMP = 1708710800 -SHA256 (NVIDIA-FreeBSD-x86_64-550.54.14.tar.xz) = 934549ee2e6cf6bc098a0794ad5c84cfa8d55c79396d1d387c37eb91c49de340 -SIZE (NVIDIA-FreeBSD-x86_64-550.54.14.tar.xz) = 143184876 +TIMESTAMP = 1722604287 +SHA256 (NVIDIA-FreeBSD-x86_64-550.107.02.tar.xz) = e0b50b94c47d4b2dedf166fb31a9989bedfc89123f118560712574c1dbfe5b02 +SIZE (NVIDIA-FreeBSD-x86_64-550.107.02.tar.xz) = 143499736 SHA256 (NVIDIA-FreeBSD-x86_64-470.161.03.tar.xz) = 54f87e6cadc4aedebc4f862e3d25657fddb867ddc3fe01ad06c9d54bcfa8d607 SIZE (NVIDIA-FreeBSD-x86_64-470.161.03.tar.xz) = 99719576 SHA256 (NVIDIA-FreeBSD-x86_64-390.154.tar.gz) = 5994c77c3510a4a89076ecf2bf402f1da635b250cca07655efc913f2a94bee84 diff -u -p -N x11/nvidia-driver/distinfo.orig x11/nvidia-driver/distinfo --- a/x11/nvidia-driver/pkg-plist 2023-12-05 00:46:50.253711000 +0900 +++ b/x11/nvidia-driver/pkg-plist 2024-08-02 22:25:50.898249000 +0900 @@ -107,3 +107,5 @@ %%MODULESDIR%%/extensions/libglxserver_nvidia.so.1 /%%KMODDIR%%/nvidia.ko /%%KMODDIR%%/nvidia-modeset.ko +/%%KMODDIR%%/nvidia_gsp_ga10x_fw.ko +/%%KMODDIR%%/nvidia_gsp_tu10x_fw.ko diff -u -p -N x11/linux-nvidia-libs/distinfo.orig x11/linux-nvidia-libs/distinfo --- a/x11/linux-nvidia-libs/distinfo 2023-12-05 00:46:50.253711000 +0900 +++ b/x11/linux-nvidia-libs/distinfo 2024-08-02 22:25:50.898249000 +0900 @@ -1,6 +1,6 @@ LINUX32_LIBS+= libnvidia-compiler.so.${PORTVERSION} -TIMESTAMP = 1708711235 -SHA256 (NVIDIA-Linux-x86_64-550.54.14.run) = 8c497ff1cfc7c310fb875149bc30faa4fd26d2237b2cba6cd2e8b0780157cfe3 -SIZE (NVIDIA-Linux-x86_64-550.54.14.run) = 306861083 +TIMESTAMP = 1722604633 +SHA256 (NVIDIA-Linux-x86_64-550.107.02.run) = f97c1ca4df306028d88c7aed631fa8061b55c57c4c234d853b575d8cce6c0168 +SIZE (NVIDIA-Linux-x86_64-550.107.02.run) = 307251605 SHA256 (NVIDIA-Linux-x86_64-470.161.03.run) = 5da82a7f8c76e781e7d7f0be7b798db4d344f26bd4facf9abcf3c71c71fe7640 SIZE (NVIDIA-Linux-x86_64-470.161.03.run) = 272397700 SHA256 (NVIDIA-Linux-x86_64-390.154.run) = f4420280c55210964c008d5b724f2615845d47ad4c9c05d8ed26a62fc6331f7c diff -u -p -N graphics/nvidia-drm-510-kmod/distinfo.orig graphics/nvidia-drm-510-kmod/distinfo --- a/graphics/nvidia-drm-510-kmod/distinfo 2023-12-05 00:46:50.253711000 +0900 +++ b/graphics/nvidia-drm-510-kmod/distinfo 2024-08-02 22:25:50.898249000 +0900 @@ -1,5 +1,5 @@ -TIMESTAMP = 1708813467 -SHA256 (NVIDIA-FreeBSD-x86_64-550.54.14.tar.xz) = 934549ee2e6cf6bc098a0794ad5c84cfa8d55c79396d1d387c37eb91c49de340 -SIZE (NVIDIA-FreeBSD-x86_64-550.54.14.tar.xz) = 143184876 +TIMESTAMP = 1722604287 +SHA256 (NVIDIA-FreeBSD-x86_64-550.107.02.tar.xz) = e0b50b94c47d4b2dedf166fb31a9989bedfc89123f118560712574c1dbfe5b02 +SIZE (NVIDIA-FreeBSD-x86_64-550.107.02.tar.xz) = 143499736 SHA256 (freebsd-drm-kmod-drm_v5.10.163_7_GH0.tar.gz) = dbdff8ad8cad8152d1c286b058f1f5114b3672f1a936e13933ce52915b77eaaa SIZE (freebsd-drm-kmod-drm_v5.10.163_7_GH0.tar.gz) = 20095338 diff -u -p -N graphics/nvidia-drm-515-kmod/distinfo.orig graphics/nvidia-drm-515-kmod/distinfo --- a/graphics/nvidia-drm-515-kmod/distinfo 2023-12-05 00:46:50.253711000 +0900 +++ b/graphics/nvidia-drm-515-kmod/distinfo 2024-08-02 22:25:50.898249000 +0900 @@ -1,5 +1,5 @@ -TIMESTAMP = 1717495786 -SHA256 (NVIDIA-FreeBSD-x86_64-550.54.14.tar.xz) = 934549ee2e6cf6bc098a0794ad5c84cfa8d55c79396d1d387c37eb91c49de340 -SIZE (NVIDIA-FreeBSD-x86_64-550.54.14.tar.xz) = 143184876 +TIMESTAMP = 1722604287 +SHA256 (NVIDIA-FreeBSD-x86_64-550.107.02.tar.xz) = e0b50b94c47d4b2dedf166fb31a9989bedfc89123f118560712574c1dbfe5b02 +SIZE (NVIDIA-FreeBSD-x86_64-550.107.02.tar.xz) = 143499736 SHA256 (freebsd-drm-kmod-drm_v5.15.160_0_GH0.tar.gz) = 350dc97a562d2642b9c420e43b66cb189297702630655a763fd1fd577f79133c SIZE (freebsd-drm-kmod-drm_v5.15.160_0_GH0.tar.gz) = 26098344 diff -u -p -N graphics/nvidia-drm-61-kmod/distinfo.orig graphics/nvidia-drm-61-kmod/distinfo --- a/graphics/nvidia-drm-61-kmod/distinfo 2023-12-05 00:46:50.253711000 +0900 +++ b/graphics/nvidia-drm-61-kmod/distinfo 2024-08-02 22:25:50.898249000 +0900 @@ -1,5 +1,5 @@ -TIMESTAMP = 1717500835 -SHA256 (NVIDIA-FreeBSD-x86_64-550.54.14.tar.xz) = 934549ee2e6cf6bc098a0794ad5c84cfa8d55c79396d1d387c37eb91c49de340 -SIZE (NVIDIA-FreeBSD-x86_64-550.54.14.tar.xz) = 143184876 +TIMESTAMP = 1722604287 +SHA256 (NVIDIA-FreeBSD-x86_64-550.107.02.tar.xz) = e0b50b94c47d4b2dedf166fb31a9989bedfc89123f118560712574c1dbfe5b02 +SIZE (NVIDIA-FreeBSD-x86_64-550.107.02.tar.xz) = 143499736 SHA256 (freebsd-drm-kmod-drm_v6.1.92_0_GH0.tar.gz) = b0283194995a2a5cfbfc662a1111f4b74a27500daa1d673d441ceca45ee7663c SIZE (freebsd-drm-kmod-drm_v6.1.92_0_GH0.tar.gz) = 37097408 --Multipart=_Sat__3_Aug_2024_10_15_47_+0900_p2h3jYvlCU0NRRte--