kern/184677 / ZFS snapshot handling deadlocks
krichy at tvnetwork.hu
krichy at tvnetwork.hu
Wed Dec 25 05:23:01 UTC 2013
Dear all,
I've made a new patch again, which fixes most of my earlier issues.
Mainly, I've enabled shared vnode locks for GFS vnodes - is it acceptable?
And that way, deadlock cases reduced much, and also the patch is more
clear (at least to me). One thing still remains, the spa_namespace_lock
race I mentioned before, I dont know how to handle that.
Waiting for comments on this.
Regards,
Kojedzinszky Richard
Euronet Magyarorszag Informatikai Zrt.
On Tue, 24 Dec 2013, krichy at tvnetwork.hu wrote:
> Date: Tue, 24 Dec 2013 21:35:16 +0100 (CET)
> From: krichy at tvnetwork.hu
> To: Pawel Jakub Dawidek <pjd at FreeBSD.org>
> Cc: freebsd-fs at freebsd.org, avg at freebsd.org
> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>
> Dear all,
>
> Now a user caused deadlock simply caused buildkernel to be blocked, so I must
> correct myself in that this not only does a DoS for .zfs/snapshot
> directories, but may for the whole system.
>
> Regards,
>
> P.S.: Merry Christmas!
>
> Kojedzinszky Richard
> Euronet Magyarorszag Informatikai Zrt.
>
> On Tue, 24 Dec 2013, krichy at tvnetwork.hu wrote:
>
>> Date: Tue, 24 Dec 2013 21:28:39 +0100 (CET)
>> From: krichy at tvnetwork.hu
>> To: Pawel Jakub Dawidek <pjd at FreeBSD.org>
>> Cc: freebsd-fs at freebsd.org, avg at freebsd.org
>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>>
>> Dear all,
>>
>> Unfortunately, this seems to be serious, as a plain user can deadlock zfs
>> using the attached script, and can make a DoS against the mounted dataset's
>> .zfs/snapshot directory.
>>
>> Regards,
>>
>> Kojedzinszky Richard
>> Euronet Magyarorszag Informatikai Zrt.
>>
>> On Tue, 24 Dec 2013, krichy at tvnetwork.hu wrote:
>>
>>> Date: Tue, 24 Dec 2013 07:10:05 +0100 (CET)
>>> From: krichy at tvnetwork.hu
>>> To: Pawel Jakub Dawidek <pjd at FreeBSD.org>
>>> Cc: freebsd-fs at freebsd.org, avg at freebsd.org
>>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>>>
>>> Dear devs,
>>>
>>> A third one, again the snapshot dirs:
>>>
>>> # cd /tmp/test/.zfs/snapshot && yes .. | xargs stat > /dev/null &
>>> # yes /tmp/test/.zfs/snapshot/snap | xargs umount > /dev/null
>>>
>>> Will deadlock in a few seconds.
>>>
>>> The first holds a lock on .zfs/snapshot vnode, when looking up .. it tries
>>> to lock .zfs, while the other just tries to do this in reversed order.
>>>
>>> In a vop_lookup, why is the passed in vnode, and the returned vnode must
>>> be locked? Why is not enough to have it held?
>>>
>>> Thanks in advance,
>>> Kojedzinszky Richard
>>> Euronet Magyarorszag Informatikai Zrt.
>>>
>>> On Mon, 23 Dec 2013, krichy at tvnetwork.hu wrote:
>>>
>>>> Date: Mon, 23 Dec 2013 17:55:08 +0100 (CET)
>>>> From: krichy at tvnetwork.hu
>>>> To: Pawel Jakub Dawidek <pjd at FreeBSD.org>
>>>> Cc: freebsd-fs at freebsd.org, avg at freebsd.org
>>>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>>>>
>>>> Dear Pawel,
>>>>
>>>> Thanks for your response. I hope someone will review my work.
>>>>
>>>> Secondly, I think I;ve found another deadlock scenario:
>>>>
>>>> When 'zfs send -R' finishes, in zfsdev_close() it has spa_namespace_lock
>>>> held. Then it does an unmount of all snapshots, which in turn goes to
>>>> zfsctl_snapshot_inactive(), in which sdp->sd_lock is being locked.
>>>>
>>>> Concurrently, a zfsctl_snapdir_lookup(), holding the same sdp->sd_lock
>>>> does a snapshot mount, which somewhere tries to acquire
>>>> spa_namespace_lock. So it deadlocks. Currently I dont know how to
>>>> workaround this.
>>>>
>>>> Is spa_namespace_lock need to be held in zfsctl_snapshot_inactive()?
>>>>
>>>> What if we release it function enter, and re-acquire upon exit?
>>>>
>>>> Thanks in advance,
>>>>
>>>>
>>>> Kojedzinszky Richard
>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>
>>>> On Fri, 20 Dec 2013, Pawel Jakub Dawidek wrote:
>>>>
>>>>> Date: Fri, 20 Dec 2013 14:44:05 +0100
>>>>> From: Pawel Jakub Dawidek <pjd at FreeBSD.org>
>>>>> To: krichy at tvnetwork.hu
>>>>> Cc: freebsd-fs at freebsd.org
>>>>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>>>>>
>>>>> On Thu, Dec 19, 2013 at 04:46:11PM +0100, krichy at tvnetwork.hu wrote:
>>>>>> Dear devs,
>>>>>>
>>>>>> I am attaching a more clear patch/fix for my snapshot handling issues
>>>>>> (0002), but I would be happy if some ZFS expert would comment it. I am
>>>>>> trying to solve it at least for two weeks now, and an ACK or a NACK
>>>>>> would
>>>>>> be nice from someone. Also a commit is reverted since it also caused
>>>>>> deadlocks. I've read its comments, which also eliminates deadlocks, but
>>>>>> I
>>>>>> did not find any reference how to produce that deadlock. In my view
>>>>>> reverting that makes my issues disappear, but I dont know what new
>>>>>> cases
>>>>>> will it rise.
>>>>>
>>>>> Richard,
>>>>>
>>>>> I won't be able to analyse it myself anytime soon, because of other
>>>>> obligations, but I forwarded you e-mail to the zfs-devel@ mailing list
>>>>> (it is closed, but gathers FreeBSD ZFS devs). Hopefully someone from
>>>>> there will be able to help you.
>>>>>
>>>>>> I've rewritten traverse() to make more like upstream, added two extra
>>>>>> VN_HOLD()s to snapdir_lookup() when traverse returned the same vnode
>>>>>> what
>>>>>> was passed to it (I dont know even that upon creating a snapshot vnode
>>>>>> why
>>>>>> is that extra two holds needed for the vnode.) And unfortunately, due
>>>>>> to
>>>>>> FreeBSD calls vop_inactive callbacks with vnodes locked, that could
>>>>>> also
>>>>>> cause deadlocks, so zfsctl_snapshot_inactive() and
>>>>>> zfsctl_snapshot_vptocnp() has been rewritten to work that around.
>>>>>>
>>>>>> After this, one may also get a deadlock, when a simple access would
>>>>>> call
>>>>>> into zfsctl_snapshot_lookup(). The documentation says, that those
>>>>>> vnodes
>>>>>> should always be covered, but after some stress test, sometimes we hit
>>>>>> that call, and that can cause again deadlocks. In our environment I've
>>>>>> just uncommented that callback, which returns ENODIR on some calls, but
>>>>>> at
>>>>>> least not a deadlock.
>>>>>>
>>>>>> The attached script can be used to reproduce my cases (would one
>>>>>> confirm
>>>>>> that?), and after the patches applied, they disappear (confirm?).
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>
>>>>>> Kojedzinszky Richard
>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>>
>>>>>> On Tue, 17 Dec 2013, krichy at tvnetwork.hu wrote:
>>>>>>
>>>>>>> Date: Tue, 17 Dec 2013 14:50:16 +0100 (CET)
>>>>>>> From: krichy at tvnetwork.hu
>>>>>>> To: pjd at freebsd.org
>>>>>>> Cc: freebsd-fs at freebsd.org
>>>>>>> Subject: Re: kern/184677 (fwd)
>>>>>>>
>>>>>>> Dear devs,
>>>>>>>
>>>>>>> I will sum up my experience regarding the issue:
>>>>>>>
>>>>>>> The sympton is that a concurrent 'zfs send -R' and some activity on
>>>>>>> the
>>>>>>> snapshot dir (or in the snapshot) may cause a deadlock.
>>>>>>>
>>>>>>> After investigating the problem, I found that zfs send umounts the
>>>>>>> snapshots,
>>>>>>> and that causes the deadlock, so later I tested only with concurrent
>>>>>>> umount
>>>>>>> and the "activity". More later I found that listing the snapshots in
>>>>>>> .zfs/snapshot/ and unounting them can cause the found deadlock, so I
>>>>>>> used
>>>>>>> them for the tests. But for my surprise, instead of a deadlock, a
>>>>>>> recursive
>>>>>>> lock panic has arised.
>>>>>>>
>>>>>>> The vnode for the ".zfs/snapshot/" directory contains zfs's
>>>>>>> zfsctl_snapdir_t
>>>>>>> structure (sdp). This contains a tree of mounted snapshots, and each
>>>>>>> entry
>>>>>>> (sep) contains the vnode of entry on which the snapshot is mounted on
>>>>>>> top
>>>>>>> (se_root). The strange is that the se_root member does not hold a
>>>>>>> reference
>>>>>>> for the vnode, just a simple pointer to it.
>>>>>>>
>>>>>>> Upon entry lookup (zfsctl_snapdir_lookup()) the "snapshot" vnode is
>>>>>>> locked,
>>>>>>> the zfsctl_snapdir_t's tree is locked, and searched for the mount if
>>>>>>> it
>>>>>>> exists already. If it founds no entry, does the mount. In the case of
>>>>>>> an
>>>>>>> entry was found, the se_root member contains the vnode which the
>>>>>>> snapshot is
>>>>>>> mounted on. Thus, a reference is taken for it, and the traverse() call
>>>>>>> will
>>>>>>> resolve to the real root vnode of the mounted snapshot, returning it
>>>>>>> as
>>>>>>> locked. (Examining the traverse() code I've found that it did not
>>>>>>> follow
>>>>>>> FreeBSD's lock order recommendation described in sys/kern/vfs_subr.c.)
>>>>>>>
>>>>>>> On the other way, when an umount is issued, the se_root vnode looses
>>>>>>> its last
>>>>>>> reference (as only the mountpoint holds one for it), it goes through
>>>>>>> the
>>>>>>> vinactive() path, to zfsctl_snapshot_inactive(). In FreeBSD this is
>>>>>>> called
>>>>>>> with a locked vnode, so this is a deadlock race condition. While
>>>>>>> zfsctl_snapdir_lookup() holds the mutex for the sdp tree, and
>>>>>>> traverse()
>>>>>>> tries to acquire the se_root, zfsctl_snapshot_inactive() holds the
>>>>>>> lock on
>>>>>>> se_root while tries to access the sdp lock.
>>>>>>>
>>>>>>> The zfsctl_snapshot_inactive() has an if statement checking the
>>>>>>> v_usecount,
>>>>>>> which is incremented in zfsctl_snapdir_lookup(), but in that context
>>>>>>> it is
>>>>>>> not covered by VI_LOCK. And it seems to me that FreeBSD's vinactive()
>>>>>>> path
>>>>>>> assumes that the vnode remains inactive (as opposed to illumos, at
>>>>>>> least how
>>>>>>> i read the code). So zfsctl_snapshot_inactive() must free resources
>>>>>>> while in
>>>>>>> a locked state. I was a bit confused, and probably that is why the
>>>>>>> previously
>>>>>>> posted patch is as is.
>>>>>>>
>>>>>>> Maybe if I had some clues on the directions of this problem, I could
>>>>>>> have
>>>>>>> worked more for a nicer, shorter solution.
>>>>>>>
>>>>>>> Please someone comment on my post.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Kojedzinszky Richard
>>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>>>
>>>>>>> On Mon, 16 Dec 2013, krichy at tvnetwork.hu wrote:
>>>>>>>
>>>>>>>> Date: Mon, 16 Dec 2013 16:52:16 +0100 (CET)
>>>>>>>> From: krichy at tvnetwork.hu
>>>>>>>> To: pjd at freebsd.org
>>>>>>>> Cc: freebsd-fs at freebsd.org
>>>>>>>> Subject: Re: kern/184677 (fwd)
>>>>>>>>
>>>>>>>> Dear PJD,
>>>>>>>>
>>>>>>>> I am a happy FreeBSD user, I am sure you've read my previous posts
>>>>>>>> regarding some issues in ZFS. Please give some advice for me, where
>>>>>>>> to look
>>>>>>>> for solutions, or how could I help to resolve those issues.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Kojedzinszky Richard
>>>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>>>>
>>>>>>>> ---------- Forwarded message ----------
>>>>>>>> Date: Mon, 16 Dec 2013 15:23:06 +0100 (CET)
>>>>>>>> From: krichy at tvnetwork.hu
>>>>>>>> To: freebsd-fs at freebsd.org
>>>>>>>> Subject: Re: kern/184677
>>>>>>>>
>>>>>>>>
>>>>>>>> Seems that pjd did a change which eliminated the zfsdev_state_lock on
>>>>>>>> Fri
>>>>>>>> Aug 12 07:04:16 2011 +0000, which might introduced a new deadlock
>>>>>>>> situation. Any comments on this?
>>>>>>>>
>>>>>>>>
>>>>>>>> Kojedzinszky Richard
>>>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>>>>
>>>>>>>> On Mon, 16 Dec 2013, krichy at tvnetwork.hu wrote:
>>>>>>>>
>>>>>>>>> Date: Mon, 16 Dec 2013 11:08:11 +0100 (CET)
>>>>>>>>> From: krichy at tvnetwork.hu
>>>>>>>>> To: freebsd-fs at freebsd.org
>>>>>>>>> Subject: kern/184677
>>>>>>>>>
>>>>>>>>> Dear devs,
>>>>>>>>>
>>>>>>>>> I've attached a patch, which makes the recursive lockmgr disappear,
>>>>>>>>> and
>>>>>>>>> makes the reported bug to disappear. I dont know if I followed any
>>>>>>>>> guidelines well, or not, but at least it works for me. Please some
>>>>>>>>> ZFS/FreeBSD fs expert review it, and fix it where it needed.
>>>>>>>>>
>>>>>>>>> But unfortunately, my original problem is still not solved, maybe
>>>>>>>>> the same
>>>>>>>>> as Ryan's:
>>>>>>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-December/018707.html
>>>>>>>>>
>>>>>>>>> Tracing the problem down is that zfsctl_snapdir_lookup() tries to
>>>>>>>>> acquire
>>>>>>>>> spa_namespace_lock while when finishing a zfs send -R does a
>>>>>>>>> zfsdev_close(), and that also holds the same mutex. And this causes
>>>>>>>>> a
>>>>>>>>> deadlock scenario. I looked at illumos's code, and for some reason
>>>>>>>>> they
>>>>>>>>> use another mutex on zfsdev_close(), which therefore may not
>>>>>>>>> deadlock with
>>>>>>>>> zfsctl_snapdir_lookup(). But I am still investigating the problem.
>>>>>>>>>
>>>>>>>>> I would like to help making ZFS more stable on freebsd also with its
>>>>>>>>> whole
>>>>>>>>> functionality. I would be very thankful if some expert would give
>>>>>>>>> some
>>>>>>>>> advice, how to solve these bugs. PJD, Steven, Xin?
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Kojedzinszky Richard
>>>>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>>>> _______________________________________________
>>>>>>>> freebsd-fs at freebsd.org mailing list
>>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>>> From 39298da838d006ad225e41529d7b7f240fccfe73 Mon Sep 17 00:00:00 2001
>>>>>> From: Richard Kojedzinszky <krichy at cflinux.hu>
>>>>>> Date: Mon, 16 Dec 2013 15:39:11 +0100
>>>>>> Subject: [PATCH 1/2] Revert "Eliminate the zfsdev_state_lock entirely
>>>>>> and
>>>>>> replace it with the"
>>>>>>
>>>>>> This reverts commit 1d8972b3f353f986eb5b85bc108b1c0d946d3218.
>>>>>>
>>>>>> Conflicts:
>>>>>> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>>>> ---
>>>>>> .../opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h | 1 +
>>>>>> .../opensolaris/uts/common/fs/zfs/vdev_geom.c | 14 ++-
>>>>>> .../opensolaris/uts/common/fs/zfs/zfs_ioctl.c | 16 +--
>>>>>> .../contrib/opensolaris/uts/common/fs/zfs/zvol.c | 119
>>>>>> +++++++++------------
>>>>>> 4 files changed, 70 insertions(+), 80 deletions(-)
>>>>>>
>>>>>> diff --git
>>>>>> a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>>>> index af2def2..8272c4d 100644
>>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>>>> @@ -383,6 +383,7 @@ extern void *zfsdev_get_soft_state(minor_t minor,
>>>>>> extern minor_t zfsdev_minor_alloc(void);
>>>>>>
>>>>>> extern void *zfsdev_state;
>>>>>> +extern kmutex_t zfsdev_state_lock;
>>>>>>
>>>>>> #endif /* _KERNEL */
>>>>>>
>>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>>>> index 15685a5..5c3e9f3 100644
>>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>>>> @@ -581,7 +581,7 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize,
>>>>>> uint64_t *max_psize,
>>>>>> struct g_provider *pp;
>>>>>> struct g_consumer *cp;
>>>>>> size_t bufsize;
>>>>>> - int error;
>>>>>> + int error, lock;
>>>>>>
>>>>>> /*
>>>>>> * We must have a pathname, and it must be absolute.
>>>>>> @@ -593,6 +593,12 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize,
>>>>>> uint64_t *max_psize,
>>>>>>
>>>>>> vd->vdev_tsd = NULL;
>>>>>>
>>>>>> + if (mutex_owned(&spa_namespace_lock)) {
>>>>>> + mutex_exit(&spa_namespace_lock);
>>>>>> + lock = 1;
>>>>>> + } else {
>>>>>> + lock = 0;
>>>>>> + }
>>>>>> DROP_GIANT();
>>>>>> g_topology_lock();
>>>>>> error = 0;
>>>>>> @@ -624,7 +630,11 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize,
>>>>>> uint64_t *max_psize,
>>>>>> !ISP2(cp->provider->sectorsize)) {
>>>>>> ZFS_LOG(1, "Provider %s has unsupported sectorsize.",
>>>>>> vd->vdev_path);
>>>>>> +
>>>>>> + g_topology_lock();
>>>>>> vdev_geom_detach(cp, 0);
>>>>>> + g_topology_unlock();
>>>>>> +
>>>>>> error = EINVAL;
>>>>>> cp = NULL;
>>>>>> } else if (cp->acw == 0 && (spa_mode(vd->vdev_spa) & FWRITE) != 0) {
>>>>>> @@ -647,6 +657,8 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize,
>>>>>> uint64_t *max_psize,
>>>>>> }
>>>>>> g_topology_unlock();
>>>>>> PICKUP_GIANT();
>>>>>> + if (lock)
>>>>>> + mutex_enter(&spa_namespace_lock);
>>>>>> if (cp == NULL) {
>>>>>> vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED;
>>>>>> return (error);
>>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>>>> index e9fba26..91becde 100644
>>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>>>> @@ -5635,7 +5635,7 @@ zfsdev_minor_alloc(void)
>>>>>> static minor_t last_minor;
>>>>>> minor_t m;
>>>>>>
>>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>>
>>>>>> for (m = last_minor + 1; m != last_minor; m++) {
>>>>>> if (m > ZFSDEV_MAX_MINOR)
>>>>>> @@ -5655,7 +5655,7 @@ zfs_ctldev_init(struct cdev *devp)
>>>>>> minor_t minor;
>>>>>> zfs_soft_state_t *zs;
>>>>>>
>>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>>
>>>>>> minor = zfsdev_minor_alloc();
>>>>>> if (minor == 0)
>>>>>> @@ -5676,7 +5676,7 @@ zfs_ctldev_init(struct cdev *devp)
>>>>>> static void
>>>>>> zfs_ctldev_destroy(zfs_onexit_t *zo, minor_t minor)
>>>>>> {
>>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>>
>>>>>> zfs_onexit_destroy(zo);
>>>>>> ddi_soft_state_free(zfsdev_state, minor);
>>>>>> @@ -5706,9 +5706,9 @@ zfsdev_open(struct cdev *devp, int flag, int
>>>>>> mode, struct thread *td)
>>>>>>
>>>>>> /* This is the control device. Allocate a new minor if requested. */
>>>>>> if (flag & FEXCL) {
>>>>>> - mutex_enter(&spa_namespace_lock);
>>>>>> + mutex_enter(&zfsdev_state_lock);
>>>>>> error = zfs_ctldev_init(devp);
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> }
>>>>>>
>>>>>> return (error);
>>>>>> @@ -5723,14 +5723,14 @@ zfsdev_close(void *data)
>>>>>> if (minor == 0)
>>>>>> return;
>>>>>>
>>>>>> - mutex_enter(&spa_namespace_lock);
>>>>>> + mutex_enter(&zfsdev_state_lock);
>>>>>> zo = zfsdev_get_soft_state(minor, ZSST_CTLDEV);
>>>>>> if (zo == NULL) {
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> return;
>>>>>> }
>>>>>> zfs_ctldev_destroy(zo, minor);
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> }
>>>>>>
>>>>>> static int
>>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>>>> index 72d4502..aec5219 100644
>>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>>>> @@ -104,11 +104,12 @@ static char *zvol_tag = "zvol_tag";
>>>>>> #define ZVOL_DUMPSIZE "dumpsize"
>>>>>>
>>>>>> /*
>>>>>> - * The spa_namespace_lock protects the zfsdev_state structure from
>>>>>> being
>>>>>> - * modified while it's being used, e.g. an open that comes in before a
>>>>>> - * create finishes. It also protects temporary opens of the dataset
>>>>>> so that,
>>>>>> + * This lock protects the zfsdev_state structure from being modified
>>>>>> + * while it's being used, e.g. an open that comes in before a create
>>>>>> + * finishes. It also protects temporary opens of the dataset so that,
>>>>>> * e.g., an open doesn't get a spurious EBUSY.
>>>>>> */
>>>>>> +kmutex_t zfsdev_state_lock;
>>>>>> static uint32_t zvol_minors;
>>>>>>
>>>>>> typedef struct zvol_extent {
>>>>>> @@ -249,7 +250,7 @@ zvol_minor_lookup(const char *name)
>>>>>> struct g_geom *gp;
>>>>>> zvol_state_t *zv = NULL;
>>>>>>
>>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>>
>>>>>> g_topology_lock();
>>>>>> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) {
>>>>>> @@ -465,11 +466,11 @@ zvol_name2minor(const char *name, minor_t *minor)
>>>>>> {
>>>>>> zvol_state_t *zv;
>>>>>>
>>>>>> - mutex_enter(&spa_namespace_lock);
>>>>>> + mutex_enter(&zfsdev_state_lock);
>>>>>> zv = zvol_minor_lookup(name);
>>>>>> if (minor && zv)
>>>>>> *minor = zv->zv_minor;
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> return (zv ? 0 : -1);
>>>>>> }
>>>>>> #endif /* sun */
>>>>>> @@ -489,10 +490,10 @@ zvol_create_minor(const char *name)
>>>>>>
>>>>>> ZFS_LOG(1, "Creating ZVOL %s...", name);
>>>>>>
>>>>>> - mutex_enter(&spa_namespace_lock);
>>>>>> + mutex_enter(&zfsdev_state_lock);
>>>>>>
>>>>>> if (zvol_minor_lookup(name) != NULL) {
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> return (SET_ERROR(EEXIST));
>>>>>> }
>>>>>>
>>>>>> @@ -500,20 +501,20 @@ zvol_create_minor(const char *name)
>>>>>> error = dmu_objset_own(name, DMU_OST_ZVOL, B_TRUE, FTAG, &os);
>>>>>>
>>>>>> if (error) {
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> return (error);
>>>>>> }
>>>>>>
>>>>>> #ifdef sun
>>>>>> if ((minor = zfsdev_minor_alloc()) == 0) {
>>>>>> dmu_objset_disown(os, FTAG);
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> return (SET_ERROR(ENXIO));
>>>>>> }
>>>>>>
>>>>>> if (ddi_soft_state_zalloc(zfsdev_state, minor) != DDI_SUCCESS) {
>>>>>> dmu_objset_disown(os, FTAG);
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> return (SET_ERROR(EAGAIN));
>>>>>> }
>>>>>> (void) ddi_prop_update_string(minor, zfs_dip, ZVOL_PROP_NAME,
>>>>>> @@ -525,7 +526,7 @@ zvol_create_minor(const char *name)
>>>>>> minor, DDI_PSEUDO, 0) == DDI_FAILURE) {
>>>>>> ddi_soft_state_free(zfsdev_state, minor);
>>>>>> dmu_objset_disown(os, FTAG);
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> return (SET_ERROR(EAGAIN));
>>>>>> }
>>>>>>
>>>>>> @@ -536,7 +537,7 @@ zvol_create_minor(const char *name)
>>>>>> ddi_remove_minor_node(zfs_dip, chrbuf);
>>>>>> ddi_soft_state_free(zfsdev_state, minor);
>>>>>> dmu_objset_disown(os, FTAG);
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> return (SET_ERROR(EAGAIN));
>>>>>> }
>>>>>>
>>>>>> @@ -587,7 +588,7 @@ zvol_create_minor(const char *name)
>>>>>>
>>>>>> zvol_minors++;
>>>>>>
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>>
>>>>>> zvol_geom_run(zv);
>>>>>>
>>>>>> @@ -609,7 +610,7 @@ zvol_remove_zv(zvol_state_t *zv)
>>>>>> minor_t minor = zv->zv_minor;
>>>>>> #endif
>>>>>>
>>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>> if (zv->zv_total_opens != 0)
>>>>>> return (SET_ERROR(EBUSY));
>>>>>>
>>>>>> @@ -635,15 +636,15 @@ zvol_remove_minor(const char *name)
>>>>>> zvol_state_t *zv;
>>>>>> int rc;
>>>>>>
>>>>>> - mutex_enter(&spa_namespace_lock);
>>>>>> + mutex_enter(&zfsdev_state_lock);
>>>>>> if ((zv = zvol_minor_lookup(name)) == NULL) {
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> return (SET_ERROR(ENXIO));
>>>>>> }
>>>>>> g_topology_lock();
>>>>>> rc = zvol_remove_zv(zv);
>>>>>> g_topology_unlock();
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> return (rc);
>>>>>> }
>>>>>>
>>>>>> @@ -755,7 +756,7 @@ zvol_update_volsize(objset_t *os, uint64_t volsize)
>>>>>> dmu_tx_t *tx;
>>>>>> int error;
>>>>>>
>>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>>
>>>>>> tx = dmu_tx_create(os);
>>>>>> dmu_tx_hold_zap(tx, ZVOL_ZAP_OBJ, TRUE, NULL);
>>>>>> @@ -786,7 +787,7 @@ zvol_remove_minors(const char *name)
>>>>>> namelen = strlen(name);
>>>>>>
>>>>>> DROP_GIANT();
>>>>>> - mutex_enter(&spa_namespace_lock);
>>>>>> + mutex_enter(&zfsdev_state_lock);
>>>>>> g_topology_lock();
>>>>>>
>>>>>> LIST_FOREACH_SAFE(gp, &zfs_zvol_class.geom, geom, gptmp) {
>>>>>> @@ -804,7 +805,7 @@ zvol_remove_minors(const char *name)
>>>>>> }
>>>>>>
>>>>>> g_topology_unlock();
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> PICKUP_GIANT();
>>>>>> }
>>>>>>
>>>>>> @@ -818,10 +819,10 @@ zvol_set_volsize(const char *name, major_t maj,
>>>>>> uint64_t volsize)
>>>>>> uint64_t old_volsize = 0ULL;
>>>>>> uint64_t readonly;
>>>>>>
>>>>>> - mutex_enter(&spa_namespace_lock);
>>>>>> + mutex_enter(&zfsdev_state_lock);
>>>>>> zv = zvol_minor_lookup(name);
>>>>>> if ((error = dmu_objset_hold(name, FTAG, &os)) != 0) {
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> return (error);
>>>>>> }
>>>>>>
>>>>>> @@ -888,7 +889,7 @@ zvol_set_volsize(const char *name, major_t maj,
>>>>>> uint64_t volsize)
>>>>>> out:
>>>>>> dmu_objset_rele(os, FTAG);
>>>>>>
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>>
>>>>>> return (error);
>>>>>> }
>>>>>> @@ -899,36 +900,19 @@ zvol_open(struct g_provider *pp, int flag, int
>>>>>> count)
>>>>>> {
>>>>>> zvol_state_t *zv;
>>>>>> int err = 0;
>>>>>> - boolean_t locked = B_FALSE;
>>>>>>
>>>>>> - /*
>>>>>> - * Protect against recursively entering spa_namespace_lock
>>>>>> - * when spa_open() is used for a pool on a (local) ZVOL(s).
>>>>>> - * This is needed since we replaced upstream zfsdev_state_lock
>>>>>> - * with spa_namespace_lock in the ZVOL code.
>>>>>> - * We are using the same trick as spa_open().
>>>>>> - * Note that calls in zvol_first_open which need to resolve
>>>>>> - * pool name to a spa object will enter spa_open()
>>>>>> - * recursively, but that function already has all the
>>>>>> - * necessary protection.
>>>>>> - */
>>>>>> - if (!MUTEX_HELD(&spa_namespace_lock)) {
>>>>>> - mutex_enter(&spa_namespace_lock);
>>>>>> - locked = B_TRUE;
>>>>>> - }
>>>>>> + mutex_enter(&zfsdev_state_lock);
>>>>>>
>>>>>> zv = pp->private;
>>>>>> if (zv == NULL) {
>>>>>> - if (locked)
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> return (SET_ERROR(ENXIO));
>>>>>> }
>>>>>>
>>>>>> if (zv->zv_total_opens == 0)
>>>>>> err = zvol_first_open(zv);
>>>>>> if (err) {
>>>>>> - if (locked)
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> return (err);
>>>>>> }
>>>>>> if ((flag & FWRITE) && (zv->zv_flags & ZVOL_RDONLY)) {
>>>>>> @@ -950,15 +934,13 @@ zvol_open(struct g_provider *pp, int flag, int
>>>>>> count)
>>>>>> #endif
>>>>>>
>>>>>> zv->zv_total_opens += count;
>>>>>> - if (locked)
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>>
>>>>>> return (err);
>>>>>> out:
>>>>>> if (zv->zv_total_opens == 0)
>>>>>> zvol_last_close(zv);
>>>>>> - if (locked)
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> return (err);
>>>>>> }
>>>>>>
>>>>>> @@ -968,18 +950,12 @@ zvol_close(struct g_provider *pp, int flag, int
>>>>>> count)
>>>>>> {
>>>>>> zvol_state_t *zv;
>>>>>> int error = 0;
>>>>>> - boolean_t locked = B_FALSE;
>>>>>>
>>>>>> - /* See comment in zvol_open(). */
>>>>>> - if (!MUTEX_HELD(&spa_namespace_lock)) {
>>>>>> - mutex_enter(&spa_namespace_lock);
>>>>>> - locked = B_TRUE;
>>>>>> - }
>>>>>> + mutex_enter(&zfsdev_state_lock);
>>>>>>
>>>>>> zv = pp->private;
>>>>>> if (zv == NULL) {
>>>>>> - if (locked)
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> return (SET_ERROR(ENXIO));
>>>>>> }
>>>>>>
>>>>>> @@ -1002,8 +978,7 @@ zvol_close(struct g_provider *pp, int flag, int
>>>>>> count)
>>>>>> if (zv->zv_total_opens == 0)
>>>>>> zvol_last_close(zv);
>>>>>>
>>>>>> - if (locked)
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> return (error);
>>>>>> }
>>>>>>
>>>>>> @@ -1658,12 +1633,12 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg,
>>>>>> int flag, cred_t *cr, int *rvalp)
>>>>>> int error = 0;
>>>>>> rl_t *rl;
>>>>>>
>>>>>> - mutex_enter(&spa_namespace_lock);
>>>>>> + mutex_enter(&zfsdev_state_lock);
>>>>>>
>>>>>> zv = zfsdev_get_soft_state(getminor(dev), ZSST_ZVOL);
>>>>>>
>>>>>> if (zv == NULL) {
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> return (SET_ERROR(ENXIO));
>>>>>> }
>>>>>> ASSERT(zv->zv_total_opens > 0);
>>>>>> @@ -1677,7 +1652,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int
>>>>>> flag, cred_t *cr, int *rvalp)
>>>>>> dki.dki_ctype = DKC_UNKNOWN;
>>>>>> dki.dki_unit = getminor(dev);
>>>>>> dki.dki_maxtransfer = 1 << (SPA_MAXBLOCKSHIFT -
>>>>>> zv->zv_min_bs);
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> if (ddi_copyout(&dki, (void *)arg, sizeof (dki), flag))
>>>>>> error = SET_ERROR(EFAULT);
>>>>>> return (error);
>>>>>> @@ -1687,7 +1662,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int
>>>>>> flag, cred_t *cr, int *rvalp)
>>>>>> dkm.dki_lbsize = 1U << zv->zv_min_bs;
>>>>>> dkm.dki_capacity = zv->zv_volsize >> zv->zv_min_bs;
>>>>>> dkm.dki_media_type = DK_UNKNOWN;
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> if (ddi_copyout(&dkm, (void *)arg, sizeof (dkm), flag))
>>>>>> error = SET_ERROR(EFAULT);
>>>>>> return (error);
>>>>>> @@ -1697,14 +1672,14 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg,
>>>>>> int flag, cred_t *cr, int *rvalp)
>>>>>> uint64_t vs = zv->zv_volsize;
>>>>>> uint8_t bs = zv->zv_min_bs;
>>>>>>
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> error = zvol_getefi((void *)arg, flag, vs, bs);
>>>>>> return (error);
>>>>>> }
>>>>>>
>>>>>> case DKIOCFLUSHWRITECACHE:
>>>>>> dkc = (struct dk_callback *)arg;
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> zil_commit(zv->zv_zilog, ZVOL_OBJ);
>>>>>> if ((flag & FKIOCTL) && dkc != NULL && dkc->dkc_callback) {
>>>>>> (*dkc->dkc_callback)(dkc->dkc_cookie, error);
>>>>>> @@ -1730,10 +1705,10 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg,
>>>>>> int flag, cred_t *cr, int *rvalp)
>>>>>> }
>>>>>> if (wce) {
>>>>>> zv->zv_flags |= ZVOL_WCE;
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> } else {
>>>>>> zv->zv_flags &= ~ZVOL_WCE;
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> zil_commit(zv->zv_zilog, ZVOL_OBJ);
>>>>>> }
>>>>>> return (0);
>>>>>> @@ -1828,7 +1803,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int
>>>>>> flag, cred_t *cr, int *rvalp)
>>>>>> break;
>>>>>>
>>>>>> }
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> return (error);
>>>>>> }
>>>>>> #endif /* sun */
>>>>>> @@ -1844,12 +1819,14 @@ zvol_init(void)
>>>>>> {
>>>>>> VERIFY(ddi_soft_state_init(&zfsdev_state, sizeof (zfs_soft_state_t),
>>>>>> 1) == 0);
>>>>>> + mutex_init(&zfsdev_state_lock, NULL, MUTEX_DEFAULT, NULL);
>>>>>> ZFS_LOG(1, "ZVOL Initialized.");
>>>>>> }
>>>>>>
>>>>>> void
>>>>>> zvol_fini(void)
>>>>>> {
>>>>>> + mutex_destroy(&zfsdev_state_lock);
>>>>>> ddi_soft_state_fini(&zfsdev_state);
>>>>>> ZFS_LOG(1, "ZVOL Deinitialized.");
>>>>>> }
>>>>>> @@ -1889,7 +1866,7 @@ zvol_dump_init(zvol_state_t *zv, boolean_t
>>>>>> resize)
>>>>>> uint64_t version = spa_version(spa);
>>>>>> enum zio_checksum checksum;
>>>>>>
>>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>> ASSERT(vd->vdev_ops == &vdev_root_ops);
>>>>>>
>>>>>> error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, 0,
>>>>>> @@ -2437,7 +2414,7 @@ zvol_rename_minor(struct g_geom *gp, const char
>>>>>> *newname)
>>>>>> struct g_provider *pp;
>>>>>> zvol_state_t *zv;
>>>>>>
>>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>> g_topology_assert();
>>>>>>
>>>>>> pp = LIST_FIRST(&gp->provider);
>>>>>> @@ -2471,7 +2448,7 @@ zvol_rename_minors(const char *oldname, const
>>>>>> char *newname)
>>>>>> newnamelen = strlen(newname);
>>>>>>
>>>>>> DROP_GIANT();
>>>>>> - mutex_enter(&spa_namespace_lock);
>>>>>> + mutex_enter(&zfsdev_state_lock);
>>>>>> g_topology_lock();
>>>>>>
>>>>>> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) {
>>>>>> @@ -2494,6 +2471,6 @@ zvol_rename_minors(const char *oldname, const
>>>>>> char *newname)
>>>>>> }
>>>>>>
>>>>>> g_topology_unlock();
>>>>>> - mutex_exit(&spa_namespace_lock);
>>>>>> + mutex_exit(&zfsdev_state_lock);
>>>>>> PICKUP_GIANT();
>>>>>> }
>>>>>> --
>>>>>> 1.8.4.2
>>>>>>
>>>>>
>>>>>> From 57d5a383b585c32c77af54e8cdacaddf8ce7584f Mon Sep 17 00:00:00 2001
>>>>>> From: Richard Kojedzinszky <krichy at cflinux.hu>
>>>>>> Date: Wed, 18 Dec 2013 22:11:21 +0100
>>>>>> Subject: [PATCH 2/2] ZFS snapshot handling fix
>>>>>>
>>>>>> ---
>>>>>> .../compat/opensolaris/kern/opensolaris_lookup.c | 13 +++---
>>>>>> .../opensolaris/uts/common/fs/zfs/zfs_ctldir.c | 53
>>>>>> +++++++++++++++-------
>>>>>> 2 files changed, 42 insertions(+), 24 deletions(-)
>>>>>>
>>>>>> diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>>>> b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>>>> index 94383d6..4cac053 100644
>>>>>> --- a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>>>> +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>>>> @@ -81,6 +81,8 @@ traverse(vnode_t **cvpp, int lktype)
>>>>>> * progress on this vnode.
>>>>>> */
>>>>>>
>>>>>> + vn_lock(cvp, lktype);
>>>>>> +
>>>>>> for (;;) {
>>>>>> /*
>>>>>> * Reached the end of the mount chain?
>>>>>> @@ -89,13 +91,7 @@ traverse(vnode_t **cvpp, int lktype)
>>>>>> if (vfsp == NULL)
>>>>>> break;
>>>>>> error = vfs_busy(vfsp, 0);
>>>>>> - /*
>>>>>> - * tvp is NULL for *cvpp vnode, which we can't unlock.
>>>>>> - */
>>>>>> - if (tvp != NULL)
>>>>>> - vput(cvp);
>>>>>> - else
>>>>>> - vrele(cvp);
>>>>>> + VOP_UNLOCK(cvp, 0);
>>>>>> if (error)
>>>>>> return (error);
>>>>>>
>>>>>> @@ -107,6 +103,9 @@ traverse(vnode_t **cvpp, int lktype)
>>>>>> vfs_unbusy(vfsp);
>>>>>> if (error != 0)
>>>>>> return (error);
>>>>>> +
>>>>>> + VN_RELE(cvp);
>>>>>> +
>>>>>> cvp = tvp;
>>>>>> }
>>>>>>
>>>>>> diff --git
>>>>>> a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>>>> index 28ab1fa..d3464b4 100644
>>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>>>> @@ -112,6 +112,25 @@ snapentry_compare(const void *a, const void *b)
>>>>>> return (0);
>>>>>> }
>>>>>>
>>>>>> +/* Return the zfsctl_snapdir_t object from current vnode, following
>>>>>> + * the lock orders in zfsctl_snapdir_lookup() to avoid deadlocks.
>>>>>> + * On return the passed in vp is unlocked */
>>>>>> +static zfsctl_snapdir_t *
>>>>>> +zfsctl_snapshot_get_snapdir(vnode_t *vp, vnode_t **dvpp)
>>>>>> +{
>>>>>> + gfs_dir_t *dp = vp->v_data;
>>>>>> + *dvpp = dp->gfsd_file.gfs_parent;
>>>>>> + zfsctl_snapdir_t *sdp;
>>>>>> +
>>>>>> + VN_HOLD(*dvpp);
>>>>>> + VOP_UNLOCK(vp, 0);
>>>>>> + vn_lock(*dvpp, LK_SHARED | LK_RETRY | LK_CANRECURSE);
>>>>>> + sdp = (*dvpp)->v_data;
>>>>>> + VOP_UNLOCK(*dvpp, 0);
>>>>>> +
>>>>>> + return (sdp);
>>>>>> +}
>>>>>> +
>>>>>> #ifdef sun
>>>>>> vnodeops_t *zfsctl_ops_root;
>>>>>> vnodeops_t *zfsctl_ops_snapdir;
>>>>>> @@ -1013,6 +1032,8 @@ zfsctl_snapdir_lookup(ap)
>>>>>> * The snapshot was unmounted behind our backs,
>>>>>> * try to remount it.
>>>>>> */
>>>>>> + VOP_UNLOCK(*vpp, 0);
>>>>>> + VN_HOLD(*vpp);
>>>>>> VERIFY(zfsctl_snapshot_zname(dvp, nm, MAXNAMELEN,
>>>>>> snapname) == 0);
>>>>>> goto domount;
>>>>>> } else {
>>>>>> @@ -1064,7 +1085,6 @@ zfsctl_snapdir_lookup(ap)
>>>>>> sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP);
>>>>>> (void) strcpy(sep->se_name, nm);
>>>>>> *vpp = sep->se_root = zfsctl_snapshot_mknode(dvp,
>>>>>> dmu_objset_id(snap));
>>>>>> - VN_HOLD(*vpp);
>>>>>> avl_insert(&sdp->sd_snaps, sep, where);
>>>>>>
>>>>>> dmu_objset_rele(snap, FTAG);
>>>>>> @@ -1075,6 +1095,7 @@ domount:
>>>>>> (void) snprintf(mountpoint, mountpoint_len,
>>>>>> "%s/" ZFS_CTLDIR_NAME "/snapshot/%s",
>>>>>> dvp->v_vfsp->mnt_stat.f_mntonname, nm);
>>>>>> + VN_HOLD(*vpp);
>>>>>> err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0);
>>>>>> kmem_free(mountpoint, mountpoint_len);
>>>>>> if (err == 0) {
>>>>>> @@ -1464,16 +1485,18 @@ zfsctl_snapshot_inactive(ap)
>>>>>> int locked;
>>>>>> vnode_t *dvp;
>>>>>>
>>>>>> - if (vp->v_count > 0)
>>>>>> - goto end;
>>>>>> -
>>>>>> - VERIFY(gfs_dir_lookup(vp, "..", &dvp, cr, 0, NULL, NULL) == 0);
>>>>>> - sdp = dvp->v_data;
>>>>>> - VOP_UNLOCK(dvp, 0);
>>>>>> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp);
>>>>>>
>>>>>> if (!(locked = MUTEX_HELD(&sdp->sd_lock)))
>>>>>> mutex_enter(&sdp->sd_lock);
>>>>>>
>>>>>> + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
>>>>>> +
>>>>>> + if (vp->v_count > 0) {
>>>>>> + mutex_exit(&sdp->sd_lock);
>>>>>> + return (0);
>>>>>> + }
>>>>>> +
>>>>>> ASSERT(!vn_ismntpt(vp));
>>>>>>
>>>>>> sep = avl_first(&sdp->sd_snaps);
>>>>>> @@ -1494,7 +1517,6 @@ zfsctl_snapshot_inactive(ap)
>>>>>> mutex_exit(&sdp->sd_lock);
>>>>>> VN_RELE(dvp);
>>>>>>
>>>>>> -end:
>>>>>> /*
>>>>>> * Dispose of the vnode for the snapshot mount point.
>>>>>> * This is safe to do because once this entry has been removed
>>>>>> @@ -1595,20 +1617,17 @@ zfsctl_snapshot_lookup(ap)
>>>>>> static int
>>>>>> zfsctl_snapshot_vptocnp(struct vop_vptocnp_args *ap)
>>>>>> {
>>>>>> - zfsvfs_t *zfsvfs = ap->a_vp->v_vfsp->vfs_data;
>>>>>> - vnode_t *dvp, *vp;
>>>>>> + vnode_t *dvp, *vp = ap->a_vp;
>>>>>> zfsctl_snapdir_t *sdp;
>>>>>> zfs_snapentry_t *sep;
>>>>>> - int error;
>>>>>> + int error = 0;
>>>>>>
>>>>>> - ASSERT(zfsvfs->z_ctldir != NULL);
>>>>>> - error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp,
>>>>>> - NULL, 0, NULL, kcred, NULL, NULL, NULL);
>>>>>> - if (error != 0)
>>>>>> - return (error);
>>>>>> - sdp = dvp->v_data;
>>>>>> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp);
>>>>>>
>>>>>> mutex_enter(&sdp->sd_lock);
>>>>>> +
>>>>>> + vn_lock(vp, LK_SHARED | LK_RETRY);
>>>>>> +
>>>>>> sep = avl_first(&sdp->sd_snaps);
>>>>>> while (sep != NULL) {
>>>>>> vp = sep->se_root;
>>>>>> --
>>>>>> 1.8.4.2
>>>>>>
>>>>>
>>>>> --
>>>>> Pawel Jakub Dawidek http://www.wheelsystems.com
>>>>> FreeBSD committer http://www.FreeBSD.org
>>>>> Am I Evil? Yes, I Am! http://mobter.com
>>>>>
>>>> _______________________________________________
>>>> freebsd-fs at freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>>>>
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: zfs-gfs.patch
Type: text/x-diff
Size: 5583 bytes
Desc:
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20131225/b26a3f6a/attachment-0001.patch>
More information about the freebsd-fs
mailing list