Re: git: 315ee00fa961 - main - zfs: merge openzfs/zfs@804414aad

From: Alexander Motin <mav_at_FreeBSD.org>
Date: Thu, 31 Aug 2023 16:50:19 UTC
On 31.08.2023 08:45, Drew Gallatin wrote:
> On Wed, Aug 30, 2023, at 8:01 PM, Alexander Motin wrote:
>> It is the first time I see a panic like this.  I'll think about it
>> tomorrow.  But I'd appreciate any information on what is your workload
>> and what are you doing related to ZIL (O_SYNC, fsync(), sync=always,
>> etc) to trigger it?  What is your pool configuration?
> 
> I'm not Gleb, but this was something at $WORK, so I can perhaps help.  
> I've included the output of zpool status, and all non-default settings 
> in the zpool.  Note that we don't use a ZIL device.

You don't use SLOG device.  ZIL is always with you, just embedded in 
this case.

I tried to think about this for couple hours and still can't see how can 
this happen.  zil_sync() should not call zil_free_lwb() unless the lwb 
is in LWB_STATE_FLUSH_DONE.  To get into LWB_STATE_FLUSH_DONE lwb should 
first delete all lwb_vdev_tree entries in zil_lwb_write_done().  And no 
new entries should be added during/after zil_lwb_write_done() due to set 
zio dependencies.

I've made a patch tuning some assertions for this context: 
https://github.com/openzfs/zfs/pull/15227 .  If the issue is 
reproducible, could you please apply it and try again?  May be it give 
us any more clues.

-- 
Alexander Motin