Re: git: 315ee00fa961 - main - zfs: merge openzfs/zfs@804414aad

From: Kyle Evans <kevans_at_FreeBSD.org>
Date: Fri, 01 Sep 2023 16:22:40 UTC
On 9/1/23 11:04, Alexander Motin wrote:
> On 01.09.2023 11:46, Kyle Evans wrote:
>> On 9/1/23 08:41, Alexander Motin wrote:
>>> On 31.08.2023 22:18, Kyle Evans wrote:
>>>> It seems to have clearly been stomped on by uma trashing. 
>>>> Encountered while running a pkgbase build, I think while it was in 
>>>> the packaging phase. I note in particular in that frame:
>>>>
>>>> (kgdb) p/x lwb->lwb_issued_timestamp
>>>> $4 = 0xdeadc0dedeadc0de
>>>>
>>>> So I guess it was freed sometime during one of the previous two 
>>>> zio_nowait() calls.
>>>
>>> Thank you, Kyle.  If the source lines are resolved correctly and it 
>>> really crashes on lwb_child_zio access, then I do see there a 
>>> possible race condition, even though I think it would involve at 
>>> least 2 or may be even 3 different threads.
>>>
>>
>> Oh, sorry- yes, it was the access to lwb_child_zio there.
>>
>>
>>> I've just created this new PR to address it:
>>> https://github.com/openzfs/zfs/pull/15233
>>>
>>> If you'll be able to test it, include also the two previous:
>>> https://github.com/openzfs/zfs/pull/15227
>>> https://github.com/openzfs/zfs/pull/15228
>>>
>>> Thank you for something actionable, it really feels much better! :)
>>>
>>
>> Perfect, thanks! I haven't been able to reproduce it since the first 
>> time, but your explanation sounds plausible to me.
>>
>> I'm not a ZFS developer, but it's not clear to me how I didn't end up 
>> tripping over other assertions, though; e.g., in 
>> zil_lwb_flush_vdevs_done:
>>
>> 1442         ASSERT3S(lwb->lwb_state, ==, LWB_STATE_WRITE_DONE);
>> 1443         lwb->lwb_state = LWB_STATE_FLUSH_DONE;
>>
>> lwb_state seems to only be set to LWB_STATE_WRITE_DONE in 
>> zil_lwb_write_done (lwb_write_zio's completion routine). I would've 
>> thought all three of these were executed synchronously in 
>> __zio_execute(), which would presumably put us in LWB_STATE_ISSUED at 
>> the time of completing the lwb_root_zio?
> 
> That is where ZIO dependencies work.  lwb_root_zio can never complete 
> before lwb_write_zio completion.   So first zil_lwb_write_done() on 
> lwb_write_zio completion should move the lwb to LWB_STATE_WRITE_DONE, 
> then zil_lwb_flush_vdevs_done() on lwb_root_zio completion should move 
> it to LWB_STATE_FLUSH_DONE, at which state zil_sync() can free it.  If 
> only at that point we try to check lwb->lwb_child_zio, we see the 
> 0xdeadc0dedeadc0de and try call zio_nowait() on it with the result you 
> saw.  Would lwb_child_zio actually be used by the specific lwb, 
> lwb_write_zio could not proceed before its completion first and so late 
> zio_nowait() call for it would be legal, but not otherwise.
> 

A-ha, ok, that makes sense- thanks for taking the time to explain that!

Thanks,

Kyle Evans