[patch] zfs livelock and thread priorities
Alexander Leidinger
Alexander at Leidinger.net
Wed Apr 29 08:44:41 UTC 2009
Quoting Ben Kelly <ben at wanderview.com> (from Tue, 28 Apr 2009 17:19:29 -0400):
>
> On Apr 28, 2009, at 4:52 PM, Ben Kelly wrote:
>
>> On Apr 28, 2009, at 2:11 PM, Artem Belevich wrote:
>>> My system had eventually deadlocked overnight, though it took much
>>> longer than before to reach that point.
>>>
>>> In the end I've got many many processes sleeping in zio_wait with no
>>> disk activity whatsoever.
>>> I'm not sure if that's the same issue or not.
>>>
>>> Here are stack traces for all processes -- http://pastebin.com/f364e1452
>>> I've got the core saved, so if you want me to dig out some more info,
>>> let me know if/how I could help.
>>
>> It looks like there is a possible deadlock between zfs_zget() and
>> zfs_zinactive(). They both acquire a lock via
>> ZFS_OBJ_HOLD_ENTER(). The zfs_zinactive() path can get called
>> indirectly from within zio_done(). The zfs_zget() can in turn
>> block waiting for zio_done()'s completion while holding the object
>> lock.
>>
>> The following patch might help:
>>
>> http://www.wanderview.com/svn/public/misc/zfs/zfs_zinactive_deadlock.diff
>>
>> This simply bails out of the inactive processing if the object lock
>> is already held. I'm not sure if this is 100% correct or not as it
>> cannot verify there are references to the vnode. I also tried
>> executing the zfs_zinactive() logic in a taskqueue to avoid the
>> deadlock, but that caused other deadlocks to occur.
>
> Sorry to reply to my own mail, but I came up with a better solution
> that I think is correct. I just vref() the vnode and then vrele()
> it again from a taskqueue to restart the zfs_zinactive() processing
> if its still applicable.
This sounds a little bit related to the issues we discussed in the
unlimited arc cache growth thread. Maybe the high value for the arc
cache was a red herring and this is the real problem for the panics /
watchdog triggers I experience on the system in question.
I'm preparing a kernel with this patch and your zfs-prio patch, but I
don't think I can fully test it this week. If I'm lucky I can install
the new kernel, but I don't think I can put load on the system this
week.
Bye,
Alexander.
--
The length of a marriage is inversely proportional
to the amount spent on the wedding.
http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137
More information about the freebsd-current
mailing list