Re: git: 315ee00fa961 - main - zfs: merge openzfs/zfs@804414aad

From: Cy Schubert <Cy.Schubert_at_cschubert.com>
Date: Thu, 31 Aug 2023 17:53:50 UTC
In message <1db726d4-32c9-e1b8-51d6-981aa51b7825@FreeBSD.org>, Alexander 
Motin
writes:
> On 31.08.2023 08:45, Drew Gallatin wrote:
> > On Wed, Aug 30, 2023, at 8:01 PM, Alexander Motin wrote:
> >> It is the first time I see a panic like this.  I'll think about it
> >> tomorrow.  But I'd appreciate any information on what is your workload
> >> and what are you doing related to ZIL (O_SYNC, fsync(), sync=always,
> >> etc) to trigger it?  What is your pool configuration?
> > 
> > I'm not Gleb, but this was something at $WORK, so I can perhaps help.  
> > I've included the output of zpool status, and all non-default settings 
> > in the zpool.  Note that we don't use a ZIL device.
>
> You don't use SLOG device.  ZIL is always with you, just embedded in 
> this case.
>
> I tried to think about this for couple hours and still can't see how can 
> this happen.  zil_sync() should not call zil_free_lwb() unless the lwb 
> is in LWB_STATE_FLUSH_DONE.  To get into LWB_STATE_FLUSH_DONE lwb should 
> first delete all lwb_vdev_tree entries in zil_lwb_write_done().  And no 
> new entries should be added during/after zil_lwb_write_done() due to set 
> zio dependencies.
>
> I've made a patch tuning some assertions for this context: 
> https://github.com/openzfs/zfs/pull/15227 .  If the issue is 
> reproducible, could you please apply it and try again?  May be it give 
> us any more clues.

One thing that circumvents my two problems is reducing poudriere bulk jobs 
from 8 to 5 on my 4 core machines.


-- 
Cheers,
Cy Schubert <Cy.Schubert@cschubert.com>
FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  https://FreeBSD.org
NTP:           <cy@nwtime.org>    Web:  https://nwtime.org

			e^(i*pi)+1=0