"panic: vm_fault: fault on nofault entry" in nvidia module on 10
Adam McDougall
mcdouga9 at egr.msu.edu
Fri Dec 20 17:34:30 UTC 2013
I know I should submit a PR and I fully intend to, but I don't have
all the details gathered yet and had to defer to more pressing bugs
or issues. But since 10.0 is very near, I should say at least something.
6 times on my home desktop, and twice this week on my work desktop I've
had a kernel panic that looks like it came from inside the nvidia kernel
module:
Info from /var/crash/core.txt.#:
Unread portion of the kernel message buffer:
[175718] panic: vm_fault: fault on nofault entry, addr: fffffe0005f13000
[175718] cpuid = 3
[175718] Uptime: 2d0h48m38s
[175718] Dumping 5442 out of 16321 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
Reading symbols from /boot/kernel/zfs.ko.symbols...done.
Loaded symbols for /boot/kernel/zfs.ko.symbols
Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
Loaded symbols for /boot/kernel/opensolaris.ko.symbols
Reading symbols from /boot/kernel/linux.ko.symbols...done.
Loaded symbols for /boot/kernel/linux.ko.symbols
Reading symbols from /boot/modules/vboxdrv.ko...done.
Loaded symbols for /boot/modules/vboxdrv.ko
Reading symbols from /boot/modules/nvidia.ko...done.
Loaded symbols for /boot/modules/nvidia.ko
#0 doadump (textdump=1) at pcpu.h:219
219 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) #0 doadump (textdump=1) at pcpu.h:219
#1 0xffffffff805cb045 in kern_reboot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:447
#2 0xffffffff805cb424 in panic (fmt=<value optimized out>)
at /usr/src/sys/kern/kern_shutdown.c:754
#3 0xffffffff807c811d in vm_fault_hold (map=0xfffff80002000000,
vaddr=<value optimized out>, fault_type=1 '\001', fault_flags=0,
m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:279
#4 0xffffffff807c6be7 in vm_fault (map=0xfffff80002000000,
vaddr=<value optimized out>, fault_type=1 '\001', fault_flags=0)
at /usr/src/sys/vm/vm_fault.c:224
#5 0xffffffff8080d01b in trap_pfault (frame=0xfffffe08491e5630, usermode=0)
at /usr/src/sys/amd64/amd64/trap.c:775
#6 0xffffffff8080c8d6 in trap (frame=0xfffffe08491e5630)
at /usr/src/sys/amd64/amd64/trap.c:463
#7 0xffffffff807f2ca2 in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:232
#8 0xffffffff8129056b in _nv000222rm () from /boot/modules/nvidia.ko
#9 0xfffffe000bfd0000 in ?? ()
#10 0xfffff8008597ac00 in ?? ()
#11 0xfffffe08491e5820 in ?? ()
#12 0xfffff8000dc58c00 in ?? ()
#13 0xfffff8008597ac00 in ?? ()
#14 0xffffffff81781558 in _nv000768rm () from /boot/modules/nvidia.ko
#15 0xfffffe000bfd0000 in ?? ()
#16 0xfffff8008597ac00 in ?? ()
#17 0xfffffe08491e5820 in ?? ()
#18 0xfffff8000dc58c00 in ?? ()
#19 0xfffff8008597ac00 in ?? ()
#20 0xffffffff817838c6 in rm_free_unused_clients ()
from /boot/modules/nvidia.ko
#21 0x0000000000018764 in ?? ()
#22 0x134198d054ad9910 in ?? ()
#23 0x134198d14318c110 in ?? ()
#24 0x134198d14318c110 in ?? ()
#25 0x134198d0cbe32d10 in ?? ()
#26 0x0000000000000000 in ?? ()
Current language: auto; currently minimal
(kgdb)
Other traces I've found are similar but not necessarily exact, although
they are ALL in nvidia.ko.
My home desktop crashed with this panic string on Nov 6, Nov 25,
Dec 4 (twice), Dec 14, and Dec 15th. Often it would crash when I
was opening thunderbird on the right monitor which is rotated vertically.
One of the panics was with the new nvidia driver from ports/184352, which
I quickly abandoned for now because it didn't solve the panic and it caused
my second monitor to be rotated wrongly. Otherwise, nvidia-driver-319.32
driving a 'G96 [Quadro FX 380]'.
When I saw the very first panic I knew I should report it, but it sure seemed
odd that only the one desktop was having trouble. Not having time every night
lately to debug this properly, I was starting to blame the hardware until it
started happening on my work desktop too. Now I have to take it seriously,
although I'll only have remote access to my work desktop for the rest of the
year after I go home today.
My work desktop crashed with this panic string on Dec 18th and 20th, but
both times it happened when I was trying to start a VM in VirtualBox.
Both my monitors are rotated vertically at work (in case this is a factor).
Only nvidia-driver-319.32, driving a 'G92 [GeForce 8800 GT]'.
Both of these computers have built-in Intel graphics of some sort, but
I'm pretty sure I'd just be running away from the problem if I went that
route, as interesting as it may be. All I really need is decent performance
with the ability to rotate one or both DVI digital outputs. I have not
configured Intel graphics for X in years so that is even lower on the list.
I don't think the build of 10 has made any difference. Some of the panics
were on r257230 (BETA2-ish) and the more recent ones on r258899 (BETA4-ish).
The nvidia driver was always compiled by a similar version jail in poudriere
(I don't have exact details, nor did I think of trying to compile locally yet).
FreeBSD 9.x was always fine in this regard. Both of these systems used to
run 9.x before I switched to a fresh install of 10 in a new zfs.
I feel I could pretty easily agitate my home computer into panicing if I
had a set of things to try at home, but I was hoping to think of a way to
make more symbols show up in the nvidia module so the backtrace would make
sense. It's largest component is a binary blob from the source which claims
to be unstripped (as does the resulting nvidia.ko, also based on its size).
Anyone else seen this? Anyone have any tips to try, or think of tests scenarios
I should explore that might help track it down? A way to see symbols from the
nvidia driver in a backtrace? I can think of some ideas to try such as dumping
rotation and I was planning on brainstorming a more concrete example with more
info, but I'm running low on time to spend on it this year and 10.0 is at the
door. I'm not so much concerned about this issue for my own sake, but for the
greater good, assuming someone else will fall into it. I'll plug away at it
as I have time but good suggestions might help my efficiency. Thanks.
More information about the freebsd-stable
mailing list