Hang in VOP_LOCK1_APV on 8-STABLE with NFS.

Ronald Klop ronald-freebsd8 at klop.yi.org
Mon Jan 10 17:48:35 UTC 2011


On Fri, 07 Jan 2011 20:52:57 +0100, Kostik Belousov <kostikbel at gmail.com>  
wrote:

> On Fri, Jan 07, 2011 at 02:37:25PM -0500, Rick Macklem wrote:
>> > Hi,
>> >
>> > OpenOffice hangs on NFS when I try to save a file or even when I try
>> > to
>> > open the save dialog in this case.
>> >
>> >
>> > $ 17:25:35 ronald at ronald [~]
>> > procstat -kk 85575
>> > PID TID COMM TDNAME KSTACK
>> > 85575 100322 soffice.bin initial thread mi_switch+0x176
>> > sleepq_wait+0x3b __lockmgr_args+0x655 vop_stdlock+0x39
>> > VOP_LOCK1_APV+0x46
>> > _vn_lock+0x44 vget+0x67 vfs_hash_get+0xeb nfs_nget+0xa8
>> > nfs_lookup+0x65e
>> > VOP_LOOKUP_APV+0x40 lookup+0x48a namei+0x518 kern_statat_vnhook+0x82
>> > kern_statat+0x15 lstat+0x22 syscallenter+0x186 syscall+0x40
>> > 85575 100502 soffice.bin - mi_switch+0x176
>> > sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _sleep+0x1a0
>> > do_cv_wait+0x639 __umtx_op_cv_wait+0x51 syscallenter+0x186
>> > syscall+0x40
>> > Xfast_syscall+0xe2
>> > 85575 100576 soffice.bin - mi_switch+0x176
>> > sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12 _sleep+0x1a0
>> > do_cv_wait+0x639 __umtx_op_cv_wait+0x51 syscallenter+0x186
>> > syscall+0x40
>> > Xfast_syscall+0xe2
>> > 85575 100577 soffice.bin - mi_switch+0x176
>> > sleepq_catch_signals+0x309 sleepq_wait_sig+0xc _sleep+0x25d
>> > kern_accept+0x19c accept+0xfe syscallenter+0x186 syscall+0x40
>> > Xfast_syscall+0xe2
>> > 85575 100578 soffice.bin - mi_switch+0x176
>> > sleepq_catch_signals+0x309 sleepq_wait_sig+0xc _cv_wait_sig+0x10e
>> > seltdwait+0xed poll+0x457 syscallenter+0x186 syscall+0x40
>> > Xfast_syscall+0xe2
>> > 85575 100579 soffice.bin - mi_switch+0x176
>> > sleepq_catch_signals+0x309 sleepq_timedwait_sig+0x12
>> > _cv_timedwait_sig+0x11d seltdwait+0x79 poll+0x457 syscallenter+0x186
>> > syscall+0x40 Xfast_syscall+0xe2
>> >
>> > $ 17:25:35 ronald at ronald [~]
>> > uname -a
>> > FreeBSD ronald.office.base.nl 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE
>> > #6:
>> > Mon Dec 27 23:49:30 CET 2010
>> > root at ronald.office.base.nl:/usr/obj/usr/src/sys/GENERIC amd64
>> >
>> I think all the above tells us is that the thread is waiting for
>> a vnode lock. The question then becomes "what is holding a lock
>> on that vnode and why?".
>>
>> > It is not possible to exit or kill soffice.bin. I had a slighty
>> > different
>> > procstat stack before, but that was fixed a couple of days ago.
>>
>> Yea, it will be in an uniterruptible sleep when waiting for a vnode  
>> lock.
>>
>> > Any thoughts? Enabling local locks in NFS doesn't fix it.
>>
>> Here's some things you could try:
>> 1 - apply the attached patch. It fixes a known problem w.r.t. the
>>     client side of the krpc. Not likely to fix this, but I can hope:-)
> 1a - Look around of other processes in the uninterruptible sleep state,
> quite possible, one of them also owns the lock the openoffice is waiting
> for. Also see
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
>
> Of the particular interest are the witness output and backtraces for
> all threads that are reported by witness as owning the vnode locks.
>
>> 2 - If #1 doesn't fix the problem:
>>     - before making it hang, start capturing packets via:
>>     # tcpdump -s 0 -w xxx host server
>>     - then make it hang, kill the above and
>>     # procstat -ka
>>     # ps axHlww
>>     and capture the output of both of these. Hopefully these 2 commands
>>     will indicate what is holding the vnode lock and maybe, why. The
>>     "xxx" file can be looked at in wireshark to see what/if any NFS
>>     traffic is happening.
>>     If you aren't comfortable looking at the above, you can email them
>>     to me and I'll take a stab at them someday.
>> 3 - Try the experimental client to see if it behaves differently. The
>>     mount command is:
>>     # mount -t newnfs -o nfsv3,<the options you already use>  
>> server:/path /mntpath
>>     (This might ideantify if the regular client has an infrequently  
>> executed code
>>      path that forgets to unlock the vnode, since it uses a somewhat  
>> different RPC
>>      layer. The buffer cache handling etc are almost the same, but the  
>> RPC stuff is
>>      fairly different.)
>>
>> > The nfs server is an up-to-date Linux Debian 5 with kernel 2.6.26.
>> >
>> I'm afraid I can't blame Linux (at least not until we have more info;-).
>>
>> > If more info is needed. I can easily reproduce this.
>>
>> See above #2.
>>
>> Good luck with it and let us know how it goes, rick

Hi,

I have got the first steps set up. No solution yet.
1. With the patch OpenOffice opens my homedir (yeah!), but it gives an I/O  
error when saving a file and everything hangs after that.
2. I have dumps and stuff. I will mail some links in private e-mail.
3. Didn't work. It mount, but ls -l /home gives "Operation not permitted".

I didn't see other processes in uninterruptable state. But maybe you guys  
see more than I do.

If you don't see anything in wireshark I will try WITNESS and friends  
later this week. Already 2 hours busy with this during work hours.

Ronald.


More information about the freebsd-stable mailing list