Re: ZFS deadlock in 14

From: Alexander Motin <mav_at_FreeBSD.org>
Date: Thu, 17 Aug 2023 18:57:31 UTC
On 15.08.2023 12:28, Dag-Erling Smørgrav wrote:
> Mateusz Guzik <mjguzik@gmail.com> writes:
>> Going through the list may or may not reveal other threads doing
>> something in the area and it very well may be they are deadlocked,
>> which then results in other processes hanging on them.
>>
>> Just like in your case the process reported as hung is a random victim
>> and whatever the real culprit is deeper.
> 
> We already know the real culprit, see upthread.

Dag, I looked through the thread once more, and, while thank you for 
tracing it, but you never went beyond txg_wait_synced() in `zfs revert` 
thread.  If you are saying that thread is holding the lock, then the 
question is why transaction commit is stuck.  I need to see stacks for 
ZFS sync threads, or better all kernel stacks, just in case.  Without 
that information I can only speculate.

Trying to run your test (so far without reproduction) I see it producing 
a substantial amount of ZIL writes.  The range of commits you reduced 
the scope to so far includes my ZIL locking refactoring, where I know 
for sure are some deadlocks.  I am already waiting for 3 weeks now for 
reviews and tests for PR that should fix it: 
https://github.com/openzfs/zfs/pull/15122 .  It would be good if you 
could test it, though it seems to depend on few more earlier patches not 
merged to FreeBSD yet.

-- 
Alexander Motin