amd64/159930: kernel core

Mon Aug 22 12:27:36 UTC 2011

On Friday, August 19, 2011 6:50:51 pm Wouter Snels wrote:
> 
> >Number:         159930
> >Category:       amd64
> >Synopsis:       kernel core
> >Confidential:   no
> >Severity:       non-critical
> >Priority:       medium
> >Responsible:    freebsd-amd64
> >State:          open
> >Quarter:        
> >Keywords:       
> >Date-Required:
> >Class:          sw-bug
> >Submitter-Id:   current-users
> >Arrival-Date:   Fri Aug 19 23:00:25 UTC 2011
> >Closed-Date:
> >Last-Modified:
> >Originator:     Wouter Snels
> >Release:        FreeBSD 8.2
> >Organization:
> >Environment:
> FreeBSD spark.ofloo.net 8.2-RELEASE-p2 FreeBSD 8.2-RELEASE-p2 #0: Wed Jul 13 
15:20:57 CEST 2011     ofloo at spark.ofloo.net:/usr/obj/usr/src/sys/OFL  amd64
> 
> >Description:
> Fatal trap 12: page fault while in kernel mode
> cpuid = 2; apic id = 02
> fault virtual address   = 0x30
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x20:0xffffffff805dd943
> stack pointer           = 0x28:0xffffff8091e3d6c0
> frame pointer           = 0x28:0xffffff8091e3d6f0
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 18 (softdepflush)
> trap number             = 12
> panic: page fault
> cpuid = 2
> KDB: stack backtrace:
> #0 0xffffffff8063300e at kdb_backtrace+0x5e
> #1 0xffffffff80602627 at panic+0x187
> #2 0xffffffff808fbbe0 at trap_fatal+0x290
> #3 0xffffffff808fbfbf at trap_pfault+0x28f
> #4 0xffffffff808fc49f at trap+0x3df
> #5 0xffffffff808e4644 at calltrap+0x8
> #6 0xffffffff805f668a at priv_check_cred+0x3a
> #7 0xffffffff8084ebd0 at chkdq+0x310
> #8 0xffffffff8082db5d at ffs_truncate+0xfed
> #9 0xffffffff8084ac5c at ufs_inactive+0x21c
> #10 0xffffffff8068a761 at vinactive+0x71
> #11 0xffffffff806904b8 at vputx+0x2d8
> #12 0xffffffff80836386 at handle_workitem_remove+0x206
> #13 0xffffffff8083675e at process_worklist_item+0x20e
> #14 0xffffffff80838893 at softdep_process_worklist+0xe3
> #15 0xffffffff80839d3c at softdep_flush+0x17c
> #16 0xffffffff805d9f28 at fork_exit+0x118
> #17 0xffffffff808e4b0e at fork_trampoline+0xe
> Uptime: 2d4h7m56s
> Cannot dump. Device not defined or unavailable.
> Automatic reboot in 15 seconds - press a key on the console to abort
> panic: bufwrite: buffer is not busy???

Hmm, the panic seems to be caused by a null ucred pointer passed to 
priv_check_cred() in chkdq():

        if ((flags & FORCE) == 0 &&
            priv_check_cred(cred, PRIV_VFS_EXCEEDQUOTA, 0))
                do_check = 1;
        else
                do_check = 0;

However, ffs_truncate() passes in NOCRED for its credential:

        if ((flags & IO_EXT) && extblocks > 0) {
                ...
#ifdef QUOTA
                        (void) chkdq(ip, -extblocks, NOCRED, 0);
#endif

A few other places call chkdq() with NOCRED (but not with the FORCE flag):

ffs/ffs_inode.c:522:    (void) chkdq(ip, -blocksreleased, NOCRED, 0);
ffs/ffs_softdep.c:6201: (void) chkdq(ip, -datablocks, NOCRED, 0);
ffs/ffs_softdep.c:6431: (void) chkdq(ip, -datablocks, NOCRED, 0);

Hmm, all these calls should be passing in a negative value though, and 
reducing usage takes a shorter path at the start of chkdq() that always 
returns without ever getting to the call to priv_check_cred().  Similarly if 
the value (e.g. extblocks) was 0.  This implies that extblocks was a negative 
value which seems very odd.  Especially given the logic in ffs_truncate():

        if ((flags & IO_EXT) && extblocks > 0) {
                ...
                        if ((error = ffs_syncvnode(vp, MNT_WAIT)) != 0)
                                return (error);
#ifdef QUOTA
                        (void) chkdq(ip, -extblocks, NOCRED, 0);
#endif

Nothing changes extblocks in between that check and the call to chkdq().  It 
would probably be best to get a crashdump if this is reproducible so we can 
investigate it further.

-- 
John Baldwin