bugs in contigmalloc*() related to "page not found in hash"
panics
Matthew Dillon
dillon at apollo.backplane.com
Wed Nov 10 22:51:47 PST 2004
:> Here is the DragonFly commit.
:>
:> http://www.dragonflybsd.org/cvsweb/src/sys/vm/vm_contig.c.diff?r1=1.10&r2=1.11&f=u
:>
:> FreeBSD-4:
:>
:> FreeBSD-4 is in the same situation that DFly was in and requires
:> the same fixes as the above patch, though note that in FreeBSD-4
:> the contigmalloc() code is in vm_page.c, not vm_contig.c.
:
:I tried the patch in the hopes it would fix my Nvidia-driver
:crash-on-demand system. :) While my system appears stable without the
:Nvidia driver but with this patch, my system can still crash easily with
:the Nvidia driver. It usually dies with a:
Point me at the nvidia driver source and I will do a quick audit of it
to see if there is anything obviously broken. This is running on
FreeBSD-4.x? If it's a binary-only driver there isn't much I can do,
though.
The 'page not found in hash' panic can ONLY occur one way: When a
vm_page's pindex or object fields are directly changed or (under 4.x,)
if the VM object's hash_rand field is changed. The only valid way
to change either of these fields is to call vm_page_insert()
or to call vm_page_remove(). That it. There is *NO* other legal
way to change those fields within a vm_page that won't result in
corruption of the VM page hash table (4.x) or object->root splay
tree (5.x). The fields cannot be modified directly, the vm_page
cannot be safely bzero'd, you can't 0 or NULL out the fields, or assign
a new index or object, etc... only vm_page_insert() and vm_page_remove()
can do that safely.
From looking at your bug reports and comparing them with my own
extensive research on this particular crash I will say *DEFINITIVELY*
that it is *NOT* a RAM problem. It's software-caused corruption,
period end of story.
I will also note that the backtrace from the panic path in the
second PR URL is very similar to what we were seeing before we fixed
the issue in contigmalloc... the problem is that the VM page hash
table / splay tree gets corrupted *LONG* before the code path that
actually causes the panic, so it's virtually impossible to glean any
information from the panic itself.
There is a test you can run. If you have a kernel vmcore and related
kernel image that contains the vm page not found in hash panic, you
can run this program on it to do a sanity check on the VM page array
and hash table. I have modified this program to work with FreeBSD-4.x
(I'd have to rewrite it to make it work with 5.x/6.x, which I don't have
time to do):
fetch http://leaf.dragonflybsd.org/~dillon/vmpageinfo_4x.c
and follow the instructions in the comments to compile it.
Run it with '-N kernel.x -M vmcore.X -d'.
This program will sanity check the VM page hash table from the core
file and tell you if there are any pages missing from the hash table
or sitting in the wrong slot.
My expectation is that it will find a page sitting in the wrong slot.
:
: Fatal trap 12: page fault while in kernel mode
: fault virtual address = 0x30
: fault code = supervisor read, page not present
:
This is a different failure. I'd need a backtrace or a kernel.debug and
vmcore to play with, and a FreeBSD developer would probably be able to
help you more with it. It's obviously a NULL pointer indirection of some
sort.
:Two "page not found in hash" panics that I believe are related to the
:Nvidia driver:
:http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/71086
:http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/72539
The 'page not found in hash' bug is *NOT* likely to be related to any
of the pmap code, simply because the sanity checks already in the
kernel (assuming the kernel is compiled with options INVARIANTS and
options INVARIANT_SUPPORT) mostly preclude an error path to this
panic from the pmap code. However, pmap panics could be related to
corrupted VM pages.
-Matt
Matthew Dillon
<dillon at backplane.com>
:The first PR (mine) asks about a change in pmap_remove() that was later
:removed from FreeBSD-4 but left in FreeBSD-5. If anyone knows why this
:happened, I would be interested in knowing.
:
:Sean
:--
:sean-freebsd at farley.org
More information about the freebsd-hackers
mailing list