unionfs bugs, a partial patch and some comments [Was: Re: 1-BETA3 Panic: __lockmgr_args: downgrade a recursed lockmgr nfs @ /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/unionfs/union_vnops.c:1905]
Harry Schmalzbauer
freebsd at omnilan.de
Wed Mar 8 08:50:57 UTC 2017
Bezüglich Konstantin Belousov's Nachricht vom 08.03.2017 00:55 (localtime):
> On Tue, Mar 07, 2017 at 10:49:01PM +0000, Rick Macklem wrote:
>> Hmm, this is going to sound dumb, but I don't recall generating any
>> unionfs patch;-)
>> I'll go look for it. Maybe it was Kostik's?
> I did not touched unionfs, and have no plans to. It is equally broken in
> all relevant versions of FreeBSD.
ACK.
While this is no good news, I have more bad news: deadlock came back…
I'd like to summarize in case anybody else is interested in uninionfs,
maybe at any time in the future:
I observed locking problems back in 2012 and Attilio Rao's final attempt
was this: https://people.freebsd.org/~attilio/unionfs_nodeget4.patch
I never used it, most likely because it didn't work even back with
RELENG_9. It applies to stable/11, but has no effect besides panicing
KDB kernels.
What I used up to 10.3 was the following simple patch:
--- src/sys/fs/unionfs/union_subr.c (revision 231702)
+++ src/sys/fs/unionfs/union_subr.c (working copy)
@@ -261,7 +261,9 @@ unionfs_nodeget(struct mount *mp, struct vnode *up
free(unp, M_UNIONFSNODE);
return (error);
}
+ vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
error = insmntque(vp, mp); /* XXX: Too early for mpsafe fs */
+ VOP_UNLOCK(vp, 0);
if (error != 0) {
free(unp, M_UNIONFSNODE);
return (error);
This hasn't lead to any panic or deadlock during the last 5 years on ~50
machines, up to 10.3.
In 2016 I did some tests with 11.0-Beta1, where this thread origins, and
Rick kindly looked into it and provided the following patch:
https://lists.freebsd.org/pipermail/freebsd-stable/attachments/20160818/d1d1691d/attachment.obj
(Explanation:
https://lists.freebsd.org/pipermail/freebsd-stable/2016-August/085294.html)
This also panics KDB-kernel (and works without KDB) but at least does
have influence on the dedalock, in case symlinks are involved, where
deadlocks are significantly postponed.
…
>>>>
>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>>>> 0xfffffe00982220e0
>>>> vpanic() at vpanic+0x186/frame 0xfffffe0098222160
>>>> kassert_panic() at kassert_panic+0x126/frame 0xfffffe00982221d0
>>>> witness_assert() at witness_assert+0x35a/frame 0xfffffe0098222230
>>>> __lockmgr_args() at __lockmgr_args+0x517/frame 0xfffffe00982222d0
>>>> vop_stdunlock() at vop_stdunlock+0x3b/frame 0xfffffe00982222f0
>>>> VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0xe0/frame 0xfffffe0098222320
>>>> unionfs_unlock() at unionfs_unlock+0x112/frame 0xfffffe0098222390
>>>> VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0xe0/frame 0xfffffe00982223c0
>>>> unionfs_nodeget() at unionfs_nodeget+0x3ef/frame 0xfffffe0098222470
>>>> unionfs_domount() at unionfs_domount+0x518/frame 0xfffffe00982226b0
>>>> vfs_donmount() at vfs_donmount+0xe37/frame 0xfffffe00982228f0
>>>> sys_nmount() at sys_nmount+0x72/frame 0xfffffe0098222930
>>>> amd64_syscall() at amd64_syscall+0x2f9/frame 0xfffffe0098222ab0
>>>> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0098222ab0
>>>> --- syscall (378, FreeBSD ELF64, sys_nmount), rip = 0x80086ecea, rsp =
>>>> 0x7fffffffe318, rbp = 0x7fffffffeca0 ---
>>> New discovery:
>>> Rick's latest patch casues panic only with KDB. If I compile a kernel
>>> without witenss and KDB, the machine boots fine!
>>> Also, it's at least not so easy anymore to trigger the deadlock :-) . I
>>> need to do more testing but until now Rick's approach seems very
>>> promising :-) .
>>
>> My unionfs deadlock problem isn't really solved with Rick's latest
>> patch, I still can reproduce it: krb5.conf and krb5.keytab are files on
>> unionfs referenced by /etc. libexec/negotiate_kerberos_auth reads these
>> and if I have enough helper processes handling requests, the deadlock
>> occurs.
>>
>> _But_: If I move the files outside the unionfs and create a symlink, I
>> cannot reproduce the deadlock anymore, which was similar easily
>> reproducable without it or any of the other workarounds.
Picture has changed, the machine daedlocked over night. So it does have
a significant influence, but unfortunately isn't the real solution.
Thanks for any help,
-harry
More information about the freebsd-stable
mailing list