Re: ZFS deadlock in 14

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sat, 19 Aug 2023 19:18:42 UTC
On Aug 19, 2023, at 11:40, Mark Millard <marklmi@yahoo.com> wrote:

> We will see how long the following high load average bulk -a
> configuration survives a build attempt, using a non-debug kernel
> for this test.
> 
> I've applied:
> 
> # fetch -o- https://github.com/openzfs/zfs/pull/15107.patch | git -C /usr/main-src/ am --dir=sys/contrib/openzfs
> -                                                       13 kB  900 kBps    00s
> Applying: Remove fastwrite mechanism.
> 
> # fetch -o- https://github.com/openzfs/zfs/pull/15122.patch | git -C /usr/main-src/ am --dir=sys/contrib/openzfs
> -                                                       45 kB 1488 kBps    00s
> Applying: ZIL: Second attempt to reduce scope of zl_issuer_lock.
> 
> on a ThreadRipper 1950X (32 hardware threads) that is at
> main 6b405053c997:
> 
> Thu, 10 Aug 2023
> . . .
>   • git: cd25b0f740f8 - main - zfs: cherry-pick fix from openzfs Martin Matuska 
>   • git: 28d2e3b5dedf - main - zfs: cherry-pick fix from openzfs Martin Matuska
> . . .
>   • git: 6b405053c997 - main - OpenSSL: clean up botched merges in OpenSSL 3.0.9 import Jung-uk Kim
> 
> So it is based on starting with the 2 cherry-pick's as
> well.
> 
> The ThreadRipper 1950X boots from a bectl BE and
> that zfs media is all that is in use here.
> 
> I've setting up to test starting a bulk -a using
> ALLOW_MAKE_JOBS=yes along with allowing 32 builders.
> This so 32*32 or so potentially for load average(s)
> at times. There is 128 GiBytes of RAM and:
> 
> # swapinfo
> Device          1K-blocks     Used    Avail Capacity
> /dev/gpt/OptBswp480 503316480        0 503316480     0%
> 
> I'm not so sure that such a high load average bulk -a
> is reasonable for a debug kernel build: unsure of
> resource usage for such and if everything could be
> tracked as needed. So I'm testing a non-debug build
> for now.
> 
> I have built the kernels (nodbg and dbg), installed
> the nodbg kernel, rebooted, and started:
> 
> # poudriere bulk -jmain-amd64-bulk_a -a
> . . .
> [00:01:22] Building 34042 packages using up to 32 builders
> . . .
> 
> The ports tree is from back in mid-July.
> 
> I have a patched up top that records and reports
> various MaxObs???? figures (Maximum Observed). It
> was recetnly reporting:
> 
> . . .;  load averages: 119.56, 106.79,  71.54 MaxObs: 184.08, 112.10,  71.54
> 1459 threads:  . . ., 273 MaxObsRunning
> . . .
> Mem: . . ., 61066Mi MaxObsActive, 10277Mi MaxObsWired, 71371Mi MaxObs(Act+Wir+Lndry)
> . . .
> Swap: . . ., 61094Mi MaxObs(Act+Lndry+SwapUsed), 71371Mi MaxObs(Act+Wir+Lndry+SwapUsed)

Status report at about 1 hr in:

[main-amd64-bulk_a-default] [2023-08-19_11h04m26s] [parallel_build:] Queued: 34435 Built: 1929  Failed: 9     Skipped: 2569  Ignored: 358   Fetched: 0     Tobuild: 29570  Time: 00:59:59

Not hung up yet.

From about 10 minutes after that:

. . . load averages: 205.56, 181.58, 153.68 MaxObs: 213.78, 182.26, 153.68
1704 threads:  . . ., 311 MaxObsRunning
. . .
Mem: . . ., 100250Mi MaxObsActive, 16857Mi MaxObsWired, 124879Mi MaxObs(Act+Wir+Lndry)
. . .
Swap: . . . 5994Mi MaxObsUsed, 116589Mi MaxObs(Act+Lndry+SwapUsed), 127354Mi MaxObs(Act+Wir+Lndry+SwapUsed)

===
Mark Millard
marklmi at yahoo.com