BUG: possible NULL pointer dereference in nfs server

Mon Jan 27 18:37:51 UTC 2014

On Sat, Jan 25, 2014 at 09:05:54PM -0500, Rick Macklem wrote:
> Bruce Evans wrote:
> > On Sat, 25 Jan 2014, Dimitry Andric wrote:
> > 
> > > On 25 Jan 2014, at 01:38, Rick Macklem <rmacklem at uoguelph.ca>
> > > wrote:
> > > ...
> > > If it inlines this, the result looks approximately like:
> > >
> > >     1	{
> > >     2		fhandle_t *fhp = NULL;
> > >     3		struct nfslockfile *new_lfp;
> > >     4		int error;
> > >     5
> > >     6		if (new_stp->ls_flags & NFSLCK_OPEN) {
> > >     7			new_lfp = *NULL;
> > >     8			fhp = &new_lfp->lf_fh;
> > >     9		} else if (&nfh) {
> > >    10			fhp = &nfh;
> > >    11		} else {
> > >    12			panic("nfsrv_getlockfh");
> > >    13		}
> > >    14		error = nfsvno_getfh(vp, fhp, p);
> > >    15		NFSEXITCODE(error);
> > >    16		getlckret = error;
> > >    17	}
> > >
> > > The code in line 7 is the problematic part.  Since this is
> > > undefined,
> > > the compiler inserts a trap instruction here.  I think the problem
> > > Roman
> > > encountered is that on sparc64, there is no equivalent to x86's ud2
> > > instruction, so it inserts a call to abort() instead, and that
> > > function
> > > is not available in kernel-land.
> > 
> > Compiler bug.  abort() is not available in freestanding
> > implementations.
> > The behaviour is only undefined if the null pointer is dereferenced
> > at
> > runtime, so it doesn't include failing to link to abort() at compile
> > time.
> > 
> > > ...
> > >> Sorry, I'm not a compiler guy, so I don't know why a compiler
> > >> would
> > >> generate a trap instruction, but since new_lfpp is never NULL when
> > >> this is executed, I don't see a problem.
> > >>
> > >> If others feel that this needs to be re-coded, please let me know
> > >> what
> > >> you think the code should look like? (A test for non-NULL with a
> > >> panic()
> > >> before it is used?)
> > >>
> > >> Is a trap instruction that never gets executed a problem?
> > >
> > > It's better to avoid undefined behavior in any case.  Just add a
> > > NULL
> > > check, that should be sufficient.
> > 
> > That might only add bloat and unimprove debugging.  Since the null
> > pointer
> > case cannot happen, it cannot be handed properly.  It can be
> > mishandled in
> > the following ways:
> > - return an error, so that all callers have to handle the null
> > pointer case
> >    that can't happen.  If the compiler is too smart, it will notice
> >    more
> >    undefined behaviour (that can't happen) in callers and "force" you
> >    to
> >    handle it there too
> > - KASSERT() that the pointer cannot be null.  Then:
> >    - on production systems where KASSERT() is null, this won't work
> >    around
> >      the compiler bug.  Use a panic() instead.  To maximize source
> >      code
> >      bloat, ifdef all of this.
> >    - when KASSERT() is not null, it will work around the compiler
> >    bug.
> >      If the case that can't happen actually happens, then this
> >      unimproves
> >      the debugging by messing up stack traces and turning restartable
> >      null pointer or SIGILL traps to non-restartable panics.
> >      Optimizations that replace a large block of code ending with a
> >      null pointer trap by a single unimplemented instruction would
> >      probably break restarting anyway.
> > 
> > Bruce
> > 
> So Roman, all I can suggest is to try adding something like:
>    if (new_lfpp == NULL)
>         panic("new_lfpp NULL");
> after line#6. If that makes the compiler happy, I can commit it in
> April. (Can't do commits before then.)

The compiler already inserts "trap" instruction when such a condition
happens so this seem superfluous.

> I agree with Bruce, but the check might be a good idea, in case a
> future code change introduces a bug where the function is called with
> new_lfpp NULL and NFSLCK_OPEN set.
> 
> If this doesn't make the compiler happy, all I can suggest is to
> play around until you come up with something that works.
 
KASSERT() doesnt communicate that it's an assert, because it can
just log into console a carry on. Would you be ok with this patch?

Index: fs/nfsserver/nfs_nfsdstate.c
===================================================================

--- fs/nfsserver/nfs_nfsdstate.c        (revision 261037)
+++ fs/nfsserver/nfs_nfsdstate.c        (working copy)
@@ -1384,7 +1384,8 @@
         * If we are doing Lock/LockU and local locking is enabled, sleep
         * lock the nfslockfile structure.
         */
-       getlckret = nfsrv_getlockfh(vp, new_stp->ls_flags, NULL, &nfh, p);
+       KASSERT((new_stp->ls_flags & NFSLCK_OPEN) == 0, ("nfsrv_lockctrl: calling nfsrv_getlockfh with NFSLCK_OPEN"));
+       getlckret = nfsrv_getlockfh(vp, new_stp->ls_flags & ~NFSLCK_OPEN, NULL, &nfh, p);
        NFSLOCKSTATE();
        if (getlckret == 0) {
                if ((new_stp->ls_flags & (NFSLCK_LOCK | NFSLCK_UNLOCK)) != 0 &&


> Have fun with it, rick
> ps: I haven't seen this reported by tinderbox. Is the problem
>     specific to your setup?

It is present even in your setup :) Just "objdump -d kernel | grep ud2" on kernel compiled
by clang.

Roman