From nobody Tue Aug 15 12:41:38 2023 X-Original-To: current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RQ9r54PPWz4mLTh for ; Tue, 15 Aug 2023 12:41:41 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-oo1-xc29.google.com (mail-oo1-xc29.google.com [IPv6:2607:f8b0:4864:20::c29]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RQ9r51GrFz3bsx for ; Tue, 15 Aug 2023 12:41:41 +0000 (UTC) (envelope-from mjguzik@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-oo1-xc29.google.com with SMTP id 006d021491bc7-56c711a889dso3745164eaf.0 for ; Tue, 15 Aug 2023 05:41:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692103299; x=1692708099; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=YTk/uFaz3T0hOAwtba3S3hkj5lmWG02kVEYh1xUMJDE=; b=ayF8Wy954kB9MmYmPQJiqW9+oYcrZYGuCG27uIQTA0LIGHgGTHW5fQwaLr/ARRIKtn erfO+ZmYuPFy4kzpJEhSkW5CKtu4iZ1beLjKBLLt/NH+sPpTZp07kzAICeSRKwIdvO5x NYOPeqkc594+uoCEXVSMazzquGn55xS3B9ZcoYClSSmm9Ox0Oc1d4B52fIaqHdzN2nuZ hEY9aH27yz2ImtlRAIUiJcZ8nJQXwI8G2JdwRzK3htB4e8uOxJUKXowSWjfn6mmlunFX c8Xz7Zpr3sCKXFE0EvKKmuFlAfUPBSIZ5q0wrpdeVZ3nKpcjLzmD0Nr3HoXEyJxGNsuD nISA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692103299; x=1692708099; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=YTk/uFaz3T0hOAwtba3S3hkj5lmWG02kVEYh1xUMJDE=; b=ikCMUUfRr+ZGjM/bys8nikVUfBqTESOUQaYoO88j/fG8VCkZjZgbpLnpX3poKY/ZNb wlAfzDKPp6Av7816KCLYBre1XVIQWkgG+fXL49P7J/sQgIMhS7uqeecuSeeTUPRxZcfC rwXm3m/JtQNy34R41P83YNhGeXPdtmhukzCVZbaTHaVGMX8+kdYBkvVVUhdGGILOeig1 xoOYTjDEfZzdii39MhavjTbS6UMqHNDe1qsH92sLglmPBniDQFK8dBKhJEo6/hVlJsjr uCKXaTRf5R3KE+2jETrWrmGTvXieVFz1Og4UBopmlSSZWedSP1LpmdfKgNiHN+2XhvHW ufdQ== X-Gm-Message-State: AOJu0YzVSav1Qz2k7iZZ4F1Nw60fb2iam9PVyzEpZBkC0vjb97L+ZMNN 7e+IWWpT744ZgDRSjKxQ9wtqPFyvdQlht/3ZgGhgJpXvg5c= X-Google-Smtp-Source: AGHT+IFb62cHPIPrip3fvSi2+HalTmoTjy9Hl4ihPCByqHM/zcXiCVqoF8L0JgQEHHsSkAfuhFo/YLzx97B5FIHz400= X-Received: by 2002:a4a:91db:0:b0:566:fba5:e51b with SMTP id e27-20020a4a91db000000b00566fba5e51bmr10594384ooh.7.1692103299438; Tue, 15 Aug 2023 05:41:39 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Received: by 2002:ac9:745a:0:b0:4f0:1250:dd51 with HTTP; Tue, 15 Aug 2023 05:41:38 -0700 (PDT) In-Reply-To: <61ca9df1b15c0e5477ff51196d0ec073@Leidinger.net> References: <61ca9df1b15c0e5477ff51196d0ec073@Leidinger.net> From: Mateusz Guzik Date: Tue, 15 Aug 2023 14:41:38 +0200 Message-ID: Subject: Re: Speed improvements in ZFS To: Alexander Leidinger Cc: current@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4RQ9r51GrFz3bsx X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] On 8/15/23, Alexander Leidinger wrote: > Hi, > > just a report that I noticed a very high speed improvement in ZFS in > -current. Since a looong time (at least since last year), for a > jail-host of mine with about >20 jails on it which each runs periodic > daily, the periodic daily runs of the jails take from about 3 am to 5pm > or longer. I don't remember when this started, and I thought at that > time that the problem may be data related. It's the long runs of "find" > in one of the periodic daily jobs which takes that long, and the number > of jails together with null-mounted basesystem inside the jail and a > null-mounted package repository inside each jail the number of files and > congruent access to the spining rust with first SSD and now NVME based > cache may have reached some tipping point. I have all the periodic daily > mails around, so theoretically I may be able to find when this started, > but as can be seen in another mail to this mailinglist, the system which > has all the periodic mails has some issues which have higher priority > for me to track down... > > Since I updated to a src from 2023-07-20, this is not the case anymore. > The data is the same (maybe even a bit more, as I have added 2 more > jails since then and the periodic daily runs which run more or less in > parallel, are not taking considerably longer). The speed increase with > the July-build are in the area of 3-4 hours for 23 parallel periodic > daily runs. So instead of finishing the periodic runs around 5pm, they > finish already around 1pm/2pm. > > So whatever was done inside ZFS or VFS or nullfs between 2023-06-19 and > 2023-07-20 has given a huge speed improvement. From my memory I would > say there is still room for improvement, as I think it may be the case > that the periodic daily runs ended in the morning instead of the > afteroon, but my memory may be flaky in this regard... > > Great work to whoever was involved. > several hours to run periodic is still unusably slow. have you tried figuring out where is the time spent? I don't know what caused the change here, but do know of one major bottleneck which you are almost guaranteed to run into if you inspect all files everywhere -- namely bumping over a vnode limit. In vn_alloc_hard you can find: msleep(&vnlruproc_sig, &vnode_list_mtx, PVFS, "vlruwk", hz); if (atomic_load_long(&numvnodes) + 1 > desiredvnodes && vnlru_read_freevnodes() > 1) vnlru_free_locked(1); that is, the allocating thread will sleep up to 1 second if there are no vnodes up for grabs and then go ahead and allocate one anyway. Going over the numvnodes is partially rate-limited, but in a manner which is not very usable. The entire is mostly borked and in desperate need of a rewrite. With this in mind can you provide: sysctl kern.maxvnodes vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes vfs.recycles_free vfs.recycles Meanwhile if there is tons of recycles, you can damage control by bumping kern.maxvnodes. If this is not the problem you can use dtrace to figure it out. -- Mateusz Guzik