From nobody Tue Aug 15 12:41:38 2023
X-Original-To: current@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RQ9r54PPWz4mLTh
	for <current@mlmmj.nyi.freebsd.org>; Tue, 15 Aug 2023 12:41:41 +0000 (UTC)
	(envelope-from mjguzik@gmail.com)
Received: from mail-oo1-xc29.google.com (mail-oo1-xc29.google.com [IPv6:2607:f8b0:4864:20::c29])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4RQ9r51GrFz3bsx
	for <current@freebsd.org>; Tue, 15 Aug 2023 12:41:41 +0000 (UTC)
	(envelope-from mjguzik@gmail.com)
Authentication-Results: mx1.freebsd.org;
	none
Received: by mail-oo1-xc29.google.com with SMTP id 006d021491bc7-56c711a889dso3745164eaf.0
        for <current@freebsd.org>; Tue, 15 Aug 2023 05:41:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1692103299; x=1692708099;
        h=cc:to:subject:message-id:date:from:references:in-reply-to
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=YTk/uFaz3T0hOAwtba3S3hkj5lmWG02kVEYh1xUMJDE=;
        b=ayF8Wy954kB9MmYmPQJiqW9+oYcrZYGuCG27uIQTA0LIGHgGTHW5fQwaLr/ARRIKtn
         erfO+ZmYuPFy4kzpJEhSkW5CKtu4iZ1beLjKBLLt/NH+sPpTZp07kzAICeSRKwIdvO5x
         NYOPeqkc594+uoCEXVSMazzquGn55xS3B9ZcoYClSSmm9Ox0Oc1d4B52fIaqHdzN2nuZ
         hEY9aH27yz2ImtlRAIUiJcZ8nJQXwI8G2JdwRzK3htB4e8uOxJUKXowSWjfn6mmlunFX
         c8Xz7Zpr3sCKXFE0EvKKmuFlAfUPBSIZ5q0wrpdeVZ3nKpcjLzmD0Nr3HoXEyJxGNsuD
         nISA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1692103299; x=1692708099;
        h=cc:to:subject:message-id:date:from:references:in-reply-to
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=YTk/uFaz3T0hOAwtba3S3hkj5lmWG02kVEYh1xUMJDE=;
        b=ikCMUUfRr+ZGjM/bys8nikVUfBqTESOUQaYoO88j/fG8VCkZjZgbpLnpX3poKY/ZNb
         wlAfzDKPp6Av7816KCLYBre1XVIQWkgG+fXL49P7J/sQgIMhS7uqeecuSeeTUPRxZcfC
         rwXm3m/JtQNy34R41P83YNhGeXPdtmhukzCVZbaTHaVGMX8+kdYBkvVVUhdGGILOeig1
         xoOYTjDEfZzdii39MhavjTbS6UMqHNDe1qsH92sLglmPBniDQFK8dBKhJEo6/hVlJsjr
         uCKXaTRf5R3KE+2jETrWrmGTvXieVFz1Og4UBopmlSSZWedSP1LpmdfKgNiHN+2XhvHW
         ufdQ==
X-Gm-Message-State: AOJu0YzVSav1Qz2k7iZZ4F1Nw60fb2iam9PVyzEpZBkC0vjb97L+ZMNN
	7e+IWWpT744ZgDRSjKxQ9wtqPFyvdQlht/3ZgGhgJpXvg5c=
X-Google-Smtp-Source: AGHT+IFb62cHPIPrip3fvSi2+HalTmoTjy9Hl4ihPCByqHM/zcXiCVqoF8L0JgQEHHsSkAfuhFo/YLzx97B5FIHz400=
X-Received: by 2002:a4a:91db:0:b0:566:fba5:e51b with SMTP id
 e27-20020a4a91db000000b00566fba5e51bmr10594384ooh.7.1692103299438; Tue, 15
 Aug 2023 05:41:39 -0700 (PDT)
List-Id: Discussions about the use of FreeBSD-current <freebsd-current.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-current
List-Help: <mailto:freebsd-current+help@freebsd.org>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Subscribe: <mailto:freebsd-current+subscribe@freebsd.org>
List-Unsubscribe: <mailto:freebsd-current+unsubscribe@freebsd.org>
Sender: owner-freebsd-current@freebsd.org
MIME-Version: 1.0
Received: by 2002:ac9:745a:0:b0:4f0:1250:dd51 with HTTP; Tue, 15 Aug 2023
 05:41:38 -0700 (PDT)
In-Reply-To: <61ca9df1b15c0e5477ff51196d0ec073@Leidinger.net>
References: <61ca9df1b15c0e5477ff51196d0ec073@Leidinger.net>
From: Mateusz Guzik <mjguzik@gmail.com>
Date: Tue, 15 Aug 2023 14:41:38 +0200
Message-ID: <CAGudoHG5Fgg4184SsXhzqYRR7VPaBXZoirGvyRyJX5ihX5YG-A@mail.gmail.com>
Subject: Re: Speed improvements in ZFS
To: Alexander Leidinger <Alexander@leidinger.net>
Cc: current@freebsd.org
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Queue-Id: 4RQ9r51GrFz3bsx
X-Spamd-Bar: ----
X-Rspamd-Pre-Result: action=no action;
	module=replies;
	Message is reply to one we originated
X-Spamd-Result: default: False [-4.00 / 15.00];
	REPLY(-4.00)[];
	ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]

On 8/15/23, Alexander Leidinger <Alexander@leidinger.net> wrote:
> Hi,
>
> just a report that I noticed a very high speed improvement in ZFS in
> -current. Since a looong time (at least since last year), for a
> jail-host of mine with about >20 jails on it which each runs periodic
> daily, the periodic daily runs of the jails take from about 3 am to 5pm
> or longer. I don't remember when this started, and I thought at that
> time that the problem may be data related. It's the long runs of "find"
> in one of the periodic daily jobs which takes that long, and the number
> of jails together with null-mounted basesystem inside the jail and a
> null-mounted package repository inside each jail the number of files and
> congruent access to the spining rust with first SSD and now NVME based
> cache may have reached some tipping point. I have all the periodic daily
> mails around, so theoretically I may be able to find when this started,
> but as can be seen in another mail to this mailinglist, the system which
> has all the periodic mails has some issues which have higher priority
> for me to track down...
>
> Since I updated to a src from 2023-07-20, this is not the case anymore.
> The data is the same (maybe even a bit more, as I have added 2 more
> jails since then and the periodic daily runs which run more or less in
> parallel, are not taking considerably longer). The speed increase with
> the July-build are in the area of 3-4 hours for 23 parallel periodic
> daily runs. So instead of finishing the periodic runs around 5pm, they
> finish already around 1pm/2pm.
>
> So whatever was done inside ZFS or VFS or nullfs between 2023-06-19 and
> 2023-07-20 has given a huge speed improvement. From my memory I would
> say there is still room for improvement, as I think it may be the case
> that the periodic daily runs ended in the morning instead of the
> afteroon, but my memory may be flaky in this regard...
>
> Great work to whoever was involved.
>

several hours to run periodic is still unusably slow.

have you tried figuring out where is the time spent?

I don't know what caused the change here, but do know of one major
bottleneck which you are almost guaranteed to run into if you inspect
all files everywhere -- namely bumping over a vnode limit.

In vn_alloc_hard you can find:
                msleep(&vnlruproc_sig, &vnode_list_mtx, PVFS, "vlruwk", hz);
                if (atomic_load_long(&numvnodes) + 1 > desiredvnodes &&
                    vnlru_read_freevnodes() > 1)
                        vnlru_free_locked(1);

that is, the allocating thread will sleep up to 1 second if there are
no vnodes up for grabs and then go ahead and allocate one anyway.
Going over the numvnodes is partially rate-limited, but in a manner
which is not very usable.

The entire is mostly borked and in desperate need of a rewrite.

With this in mind can you provide: sysctl kern.maxvnodes
vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes
vfs.recycles_free vfs.recycles

Meanwhile if there is tons of recycles, you can damage control by
bumping kern.maxvnodes.

If this is not the problem you can use dtrace to figure it out.

-- 
Mateusz Guzik <mjguzik gmail.com>