UFS related panic (daily <-> find)

Fri Jul 26 20:20:34 UTC 2013

On Friday, July 26, 2013 3:00:33 pm rank1seeker at gmail.com wrote:
> > > > > I had 2 panics: (Both occured at 3 AM, so had to be daily task)
> > > > > 
> > > > > First (Jul  2 03:06:50 2013):
> > > > > --
> > > > > Fatal trap 12: page fault while in kernel mode
> > > > > fault virtual address   = 0x19
> > > > > fault code              = supervisor read, page not present
> > > > > instruction pointer     = 0x20:0xc06caf34
> > > > > stack pointer           = 0x28:0xe76248fc
> > > > > frame pointer           = 0x28:0xe7624930
> > > > > code segment            = base 0x0, limit 0xfffff, type 0x1b
> > > > >                         = DPL 0, pres 1, def32 1, gran 1
> > > > > processor eflags        = interrupt enabled, resume, IOPL = 0
> > > > > current process         = 76562 (find)
> > > > > trap number             = 12
> > > > > panic: page fault
> > > > > Uptime: 23h0m41s
> > > > > Physical memory: 1014 MB
> > > > > Dumping 186 MB: 171 155 139 123 107 91 75 59 43 27 11
> > > > > 
> > > > > #7  0xc06caf34 in cache_lookup_times (dvp=0xc784a990, 
> vpp=0xe7624ae8,
> > > > >     cnp=0xe7624afc, tsp=0x0, ticksp=0x0) at 
> > > > /usr/src/sys/kern/vfs_cache.c:547
> > > > 
> > > > Can you go up to this frame and do 'l'?
> > > > 
> > > > -- 
> > > > John Baldwin
> > > 
> > > 
> > > Sure,
> > > 
> > > ---------
> > > (kgdb) up 7
> > > #7  0xc06caf34 in cache_lookup_times (dvp=0xc784a990, vpp=0xe7624ae8, 
> cnp=0xe7624afc, tsp=0x0, ticksp=0x0) at /usr/src/sys/kern/vfs_cache.c:547
> > > 547                     numchecks++;
> > > ---------
> > > (kgdb) l
> > > 542             }
> > > 543
> > > 544             hash = fnv_32_buf(cnp->cn_nameptr, cnp->cn_namelen, 
> FNV1_32_INIT);
> > > 545             hash = fnv_32_buf(&dvp, sizeof(dvp), hash);
> > > 546             LIST_FOREACH(ncp, (NCHHASH(hash)), nc_hash) {
> > > 547                     numchecks++;
> > > 548                     if (ncp->nc_dvp == dvp && ncp->nc_nlen == 
> cnp->cn_namelen &&
> > > 549                         !bcmp(nc_get_name(ncp), cnp->cn_nameptr, 
> ncp->nc_nlen))
> > > 550                             break;
> > > 551             }
> > > ---------
> > 
> > Hmm, 'p ncp' and 'p *ncp' at that frame perhaps?
> > 
> 
> (kgdb) p ncp
> $1 = (struct namecache *) 0x1
> (kgdb) p *ncp
> Cannot access memory at address 0x1

Interesting.  Maybe look at NCHHASH(hash) (you'll have to expand the macro manually)
and see if the head node is corrupted or walk the list to find the corrupted node.
Given that it is a single bit error, there is a chance this is a RAM problem.  If it
is in the hash table head entry then that would always be at the same physical address
for the same kernel I think.

-- 
John Baldwin