[krichy at tvnetwork.hu: Re: kern/184677 / ZFS snapshot handling deadlocks]

Fri Dec 20 18:33:56 UTC 2013

There's a good chance it's already fixed in SpectraBSD ZFS.  I fixed a
very similar bug (panic on snapshot unmount) back in July/August that
involved fixing issues related to the GFS/snapdir port.  The
underlying bug was that we don't move process refcounts to the proper
vnode.  It sounds like the reported issue is different, but caused by
the same underlying bugs.

I was planning to port the patch (SpectraBSD/stable c/l 974581) to
FreeBSD, but haven't had time yet.

--Will.

On Fri, Dec 20, 2013 at 11:15 AM, Alan Somers <asomers at freebsd.org> wrote:
> On Fri, Dec 20, 2013 at 6:41 AM, Pawel Jakub Dawidek <pjd at freebsd.org> wrote:
>> Guys,
>>
>> can someone please look into this report? It is very detailed and even
>> have a patch. I can't get to it right now, but I also remember someone
>> was going to rework the locking in vdev_geom.c. Did that happen? Maybe
>> some regression was introduced?
>
> I don't have time to thoroughly review the patch, but I'll put it
> through our test suite.
>
>>
>> --
>> Pawel Jakub Dawidek                       http://www.wheelsystems.com
>> FreeBSD committer                         http://www.FreeBSD.org
>> Am I Evil? Yes, I Am!                     http://mobter.com
>>
>>
>> ---------- Forwarded message ----------
>> From: krichy at tvnetwork.hu
>> To: freebsd-fs at freebsd.org
>> Cc: pawel at dawidek.net
>> Date: Thu, 19 Dec 2013 16:46:11 +0100 (CET)
>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>> Dear devs,
>>
>> I am attaching a more clear patch/fix for my snapshot handling issues (0002), but I would be happy if some ZFS expert would comment it. I am trying to solve it at least for two weeks now, and an ACK or a NACK would be nice from someone. Also a commit is reverted since it also caused deadlocks. I've read its comments, which also eliminates deadlocks, but I did not find any reference how to produce that deadlock. In my view reverting that makes my issues disappear, but I dont know what new cases will it rise.
>>
>> I've rewritten traverse() to make more like upstream, added two extra VN_HOLD()s to snapdir_lookup() when traverse returned the same vnode what was passed to it (I dont know even that upon creating a snapshot vnode why is that extra two holds needed for the vnode.) And unfortunately, due to FreeBSD calls vop_inactive callbacks with vnodes locked, that could also cause deadlocks, so zfsctl_snapshot_inactive() and zfsctl_snapshot_vptocnp() has been rewritten to work that around.
>>
>> After this, one may also get a deadlock, when a simple access would call into zfsctl_snapshot_lookup(). The documentation says, that those vnodes should always be covered, but after some stress test, sometimes we hit that call, and that can cause again deadlocks. In our environment I've just uncommented that callback, which returns ENODIR on some calls, but at least not a deadlock.
>>
>> The attached script can be used to reproduce my cases (would one confirm that?), and after the patches applied, they disappear (confirm?).
>>
>> Thanks,
>>
>>
>> Kojedzinszky Richard
>> Euronet Magyarorszag Informatikai Zrt.
>>
>> On Tue, 17 Dec 2013, krichy at tvnetwork.hu wrote:
>>
>>> Date: Tue, 17 Dec 2013 14:50:16 +0100 (CET)
>>> From: krichy at tvnetwork.hu
>>> To: pjd at freebsd.org
>>> Cc: freebsd-fs at freebsd.org
>>> Subject: Re: kern/184677 (fwd)
>>>
>>> Dear devs,
>>>
>>> I will sum up my experience regarding the issue:
>>>
>>> The sympton is that a concurrent 'zfs send -R' and some activity on the snapshot dir (or in the snapshot) may cause a deadlock.
>>>
>>> After investigating the problem, I found that zfs send umounts the snapshots, and that causes the deadlock, so later I tested only with concurrent umount and the "activity". More later I found that listing the snapshots in .zfs/snapshot/ and unounting them can cause the found deadlock, so I used them for the tests. But for my surprise, instead of a deadlock, a recursive lock panic has arised.
>>>
>>> The vnode for the ".zfs/snapshot/" directory contains zfs's zfsctl_snapdir_t structure (sdp). This contains a tree of mounted snapshots, and each entry (sep) contains the vnode of entry on which the snapshot is mounted on top (se_root). The strange is that the se_root member does not hold a reference for the vnode, just a simple pointer to it.
>>>
>>> Upon entry lookup (zfsctl_snapdir_lookup()) the "snapshot" vnode is locked, the zfsctl_snapdir_t's tree is locked, and searched for the mount if it exists already. If it founds no entry, does the mount. In the case of an entry was found, the se_root member contains the vnode which the snapshot is mounted on. Thus, a reference is taken for it, and the traverse() call will resolve to the real root vnode of the mounted snapshot, returning it as locked. (Examining the traverse() code I've found that it did not follow FreeBSD's lock order recommendation described in sys/kern/vfs_subr.c.)
>>>
>>> On the other way, when an umount is issued, the se_root vnode looses its last reference (as only the mountpoint holds one for it), it goes through the vinactive() path, to zfsctl_snapshot_inactive(). In FreeBSD this is called with a locked vnode, so this is a deadlock race condition. While zfsctl_snapdir_lookup() holds the mutex for the sdp tree, and traverse() tries to acquire the se_root, zfsctl_snapshot_inactive() holds the lock on se_root while tries to access the sdp lock.
>>>
>>> The zfsctl_snapshot_inactive() has an if statement checking the v_usecount, which is incremented in zfsctl_snapdir_lookup(), but in that context it is not covered by VI_LOCK. And it seems to me that FreeBSD's vinactive() path assumes that the vnode remains inactive (as opposed to illumos, at least how i read the code). So zfsctl_snapshot_inactive() must free resources while in a locked state. I was a bit confused, and probably that is why the previously posted patch is as is.
>>>
>>> Maybe if I had some clues on the directions of this problem, I could have worked more for a nicer, shorter solution.
>>>
>>> Please someone comment on my post.
>>>
>>> Regards,
>>>
>>>
>>>
>>> Kojedzinszky Richard
>>> Euronet Magyarorszag Informatikai Zrt.
>>>
>>> On Mon, 16 Dec 2013, krichy at tvnetwork.hu wrote:
>>>
>>>> Date: Mon, 16 Dec 2013 16:52:16 +0100 (CET)
>>>> From: krichy at tvnetwork.hu
>>>> To: pjd at freebsd.org
>>>> Cc: freebsd-fs at freebsd.org
>>>> Subject: Re: kern/184677 (fwd)
>>>>
>>>> Dear PJD,
>>>>
>>>> I am a happy FreeBSD user, I am sure you've read my previous posts regarding some issues in ZFS. Please give some advice for me, where to look for solutions, or how could I help to resolve those issues.
>>>>
>>>> Regards,
>>>> Kojedzinszky Richard
>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>
>>>> ---------- Forwarded message ----------
>>>> Date: Mon, 16 Dec 2013 15:23:06 +0100 (CET)
>>>> From: krichy at tvnetwork.hu
>>>> To: freebsd-fs at freebsd.org
>>>> Subject: Re: kern/184677
>>>>
>>>>
>>>> Seems that pjd did a change which eliminated the zfsdev_state_lock on Fri Aug 12 07:04:16 2011 +0000, which might introduced a new deadlock situation. Any comments on this?
>>>>
>>>>
>>>> Kojedzinszky Richard
>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>
>>>> On Mon, 16 Dec 2013, krichy at tvnetwork.hu wrote:
>>>>
>>>>> Date: Mon, 16 Dec 2013 11:08:11 +0100 (CET)
>>>>> From: krichy at tvnetwork.hu
>>>>> To: freebsd-fs at freebsd.org
>>>>> Subject: kern/184677
>>>>>
>>>>> Dear devs,
>>>>>
>>>>> I've attached a patch, which makes the recursive lockmgr disappear, and makes the reported bug to disappear. I dont know if I followed any guidelines well, or not, but at least it works for me. Please some ZFS/FreeBSD fs expert review it, and fix it where it needed.
>>>>>
>>>>> But unfortunately, my original problem is still not solved, maybe the same as Ryan's: http://lists.freebsd.org/pipermail/freebsd-fs/2013-December/018707.html
>>>>>
>>>>> Tracing the problem down is that zfsctl_snapdir_lookup() tries to acquire spa_namespace_lock while when finishing a zfs send -R does a zfsdev_close(), and that also holds the same mutex. And this causes a deadlock scenario. I looked at illumos's code, and for some reason they use another mutex on zfsdev_close(), which therefore may not deadlock with zfsctl_snapdir_lookup(). But I am still investigating the problem.
>>>>>
>>>>> I would like to help making ZFS more stable on freebsd also with its whole functionality. I would be very thankful if some expert would give some advice, how to solve these bugs. PJD, Steven, Xin?
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>>
>>>>> Kojedzinszky Richard
>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>
>>>> _______________________________________________
>>>> freebsd-fs at freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>>>>
>>
> _______________________________________________
> zfs-devel at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/zfs-devel
> To unsubscribe, send any mail to "zfs-devel-unsubscribe at freebsd.org"