panic: LK_RETRY set with incompatible flags (0x200400) or an error occured (11)

Tue Feb 18 13:14:34 UTC 2014

On Sat, Feb 15, 2014 at 10:02:59PM +0200, Konstantin Belousov wrote:
> On Sat, Feb 15, 2014 at 02:12:40PM +0200, Andriy Gapon wrote:
> > on 14/02/2014 21:18 Jeremie Le Hen said the following:
> > > I've just got another occurence of the exact same panic.  Any clue how
> > > to debug this?
> > 
> > Could you please obtain *vp from frame 12 ?
> > 
> > The problem seems to be happening in this piece of ZFS code:
> >                 if (cnp->cn_flags & ISDOTDOT) {
> >                         ltype = VOP_ISLOCKED(dvp);
> >                         VOP_UNLOCK(dvp, 0);
> >                 }
> >                 ZFS_EXIT(zfsvfs);
> >                 error = vn_lock(*vpp, cnp->cn_lkflags);
> >                 if (cnp->cn_flags & ISDOTDOT)
> >                         vn_lock(dvp, ltype | LK_RETRY);
> > 
> > ltype is apparently LK_SHARED and the assertion is apparently triggered by
> > EDEADLK error.  The error can occur only if a thread tries to obtain a lock in a
> > shared mode when it already has the lock exclusively.
> > My only explanation of how this could happen is that dvp == *vpp and cn_lkflags
> > is LK_EXCLUSIVE.  In other words, this is a dot-dot lookup that results in the
> > same vnode.  I think that this is only possible if dvp is the root vnode.
> > I am not sure if my theory is correct though.
> > Also, I am not sure if zfs_lookup() should be prepared to handle such a lookup
> > or if this kind of lookup should be handled by upper/other layers.  In this case
> > these would be VFS lookup code and nullfs code.
> > 
> 
> So, is VV_ROOT flag set on the corresponding ZFS vnode ?
> 
> Just in case, you could try the following change, but I doubt that it
> would have any effect.  Nullfs root vnode is cached so its VV_ROOT flag
> should not be lost.  Also, I never seen similar issue with UFS.
> 
> diff --git a/sys/fs/nullfs/null_subr.c b/sys/fs/nullfs/null_subr.c
> index fa6c4af..3f74579 100644
> --- a/sys/fs/nullfs/null_subr.c
> +++ b/sys/fs/nullfs/null_subr.c
> @@ -251,6 +251,7 @@ null_nodeget(mp, lowervp, vpp)
>  	vp->v_type = lowervp->v_type;
>  	vp->v_data = xp;
>  	vp->v_vnlock = lowervp->v_vnlock;
> +	vp->v_vflag = lowervp->v_vflag & VV_ROOT;
>  	error = insmntque1(vp, mp, null_insmntque_dtr, xp);
>  	if (error != 0)
>  		return (error);

I've applied it and recompiling my kernel right now.  I cannot really
reproduce the problem for sure: it sometimes happens when I'm performing
file manipulations on command-line on my nullfs-mounted zfs dataset;
right after the reboot, I try again and it works.

Well, now I'm writing this, this could well be the problem you describe:
right after the boot I guess the root vnode is cached and still here.

-- 
Jeremie Le Hen

Scientists say the world is made up of Protons, Neutrons and Electrons.
They forgot to mention Morons.