From nobody Tue Apr 19 09:12:27 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 6D4D611E3B77 for ; Tue, 19 Apr 2022 09:12:37 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-lf1-x12c.google.com (mail-lf1-x12c.google.com [IPv6:2a00:1450:4864:20::12c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KjJ3m2Xdyz4hc1 for ; Tue, 19 Apr 2022 09:12:36 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-lf1-x12c.google.com with SMTP id b21so28260423lfb.5 for ; Tue, 19 Apr 2022 02:12:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=4A8sy6LJ/KfPX9W5qEIhmpvtfz8fgVlk2MQP4WXqbLk=; b=AR6Gjzyu8bMvNNs29LJeqoUoLTO++60gD0iGsJQjr68PmkdhuKOCC4ypqlLVaoSCSp 5rjRLnoUkLmuTiMvy3V/grBlUAc77I2PT0HBWUQEA/Cjy/vQq64zzOkZ6WERhwk1cL2i USNi1hy97psEqwhsOwd5gRbwhEn5RGhpyYYKySw4W4zCHbQh8aVADtGnpvLcaM77Ol4C CWEOyn4a1e7pfOHWfVov+CwLj/WKxw4pcvZwXrwGItW+G0QgQzzCWz72rL7sMgFe81LX YDFV6WYH7q/9rfhg68y5kRkUh8qnEWpOYYiarg/IoMmZ6GgdoPbB3hhmakSXruwN1rXn eGnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=4A8sy6LJ/KfPX9W5qEIhmpvtfz8fgVlk2MQP4WXqbLk=; b=QcLigl1gtkC8vejN6VpXHNOiNlmHReWchiGuRHLTM9eE67JMnz5UTgakHW2X1O8/iN 8JNZA30krLXiftO1/rHtycPUSPPxXJoRdbGnkxP0zLF/I1xCAl8E6rppe7nF5Zb/wrud fLbHQHrbzvvRGjwp4farIDYvHNKm8p/1O+IrY7EptQbanXN1hpmjXRjSQzRNwBimGUie Oh1bBH/WuwFUee2mF76AhKMTBt/M2eR43tzw+wWy4IqYxLL3wBJ/ngTkOqCE1dpF5Bag h/ecYDA3zJnuOqd9On4ijCeWHhnkCHRSZZTID4G43uHPKeqeVX+FwqcWtFswPYWsPFuS t4pw== X-Gm-Message-State: AOAM530fisoOq18TxNewkBYZNeqPSsrsee1YBxb7RE401fEfY02RVLUY 1lfNWxp2k/9HRHb8eSMmUEc5DnRyodyesNjg16kUxJTt X-Google-Smtp-Source: ABdhPJwyBGvue3cuE5hWazgBSNeWYCuoodbOnG8QAc5JRim8ArJj9boj5XTqjPrYjTygfXzOlzSdHKGQoqC5IEuJT0U= X-Received: by 2002:a05:6512:3a95:b0:471:886a:8117 with SMTP id q21-20020a0565123a9500b00471886a8117mr7996555lfu.682.1650359548533; Tue, 19 Apr 2022 02:12:28 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Received: by 2002:a05:6520:6145:b0:1bb:7433:4cdd with HTTP; Tue, 19 Apr 2022 02:12:27 -0700 (PDT) In-Reply-To: References: From: Mateusz Guzik Date: Tue, 19 Apr 2022 11:12:27 +0200 Message-ID: Subject: Re: nullfs and ZFS issues To: Doug Ambrisko Cc: freebsd-current@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4KjJ3m2Xdyz4hc1 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=AR6Gjzyu; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of mjguzik@gmail.com designates 2a00:1450:4864:20::12c as permitted sender) smtp.mailfrom=mjguzik@gmail.com X-Spamd-Result: default: False [-3.92 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36:c]; FREEMAIL_FROM(0.00)[gmail.com]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; NEURAL_HAM_LONG(-1.00)[-1.000]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MID_RHS_MATCH_FROMTLD(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::12c:from]; MLMMJ_DEST(0.00)[freebsd-current]; NEURAL_HAM_SHORT(-0.92)[-0.919]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim] X-ThisMailContainsUnwantedMimeParts: N On 4/19/22, Doug Ambrisko wrote: > I've switched my laptop to use nullfs and ZFS. Previously, I used > localhost NFS mounts instead of nullfs when nullfs would complain > that it couldn't mount. Since that check has been removed, I've > switched to nullfs only. However, every so often my laptop would > get slow and the the ARC evict and prune thread would consume two > cores 100% until I rebooted. I had a 1G max. ARC and have increased > it to 2G now. Looking into this has uncovered some issues: > - nullfs would prevent vnlru_free_vfsops from doing anything > when called from ZFS arc_prune_task > - nullfs would hang onto a bunch of vnodes unless mounted with > nocache > - nullfs and nocache would break untar. This has been fixed now. > > With nullfs, nocache and settings max vnodes to a low number I can > keep the ARC around the max. without evict and prune consuming > 100% of 2 cores. This doesn't seem like the best solution but it > better then when the ARC starts spinning. > > Looking into this issue with bhyve and a md drive for testing I create > a brand new zpool mounted as /test and then nullfs mount /test to /mnt. > I loop through untaring the Linux kernel into the nullfs mount, rm -rf it > and repeat. I set the ARC to the smallest value I can. Untarring the > Linux kernel was enough to get the ARC evict and prune to spin since > they couldn't evict/prune anything. > > Looking at vnlru_free_vfsops called from ZFS arc_prune_task I see it > static int > vnlru_free_impl(int count, struct vfsops *mnt_op, struct vnode *mvp) > { > ... > > for (;;) { > ... > vp = TAILQ_NEXT(vp, v_vnodelist); > ... > > /* > * Don't recycle if our vnode is from different type > * of mount point. Note that mp is type-safe, the > * check does not reach unmapped address even if > * vnode is reclaimed. > */ > if (mnt_op != NULL && (mp = vp->v_mount) != NULL && > mp->mnt_op != mnt_op) { > continue; > } > ... > > The vp ends up being the nulfs mount and then hits the continue > even though the passed in mvp is on ZFS. If I do a hack to > comment out the continue then I see the ARC, nullfs vnodes and > ZFS vnodes grow. When the ARC calls arc_prune_task that calls > vnlru_free_vfsops and now the vnodes go down for nullfs and ZFS. > The ARC cache usage also goes down. Then they increase again until > the ARC gets full and then they go down again. So with this hack > I don't need nocache passed to nullfs and I don't need to limit > the max vnodes. Doing multiple untars in parallel over and over > doesn't seem to cause any issues for this test. I'm not saying > commenting out continue is the fix but a simple POC test. > I don't see an easy way to say "this is a nullfs vnode holding onto a zfs vnode". Perhaps the routine can be extrended with issuing a nullfs callback, if the module is loaded. In the meantime I think a good enough(tm) fix would be to check that nothing was freed and fallback to good old regular clean up without filtering by vfsops. This would be very similar to what you are doing with your hack. > It appears that when ZFS is asking for cached vnodes to be > free'd nullfs also needs to free some up as well so that > they are free'd on the VFS level. It seems that vnlru_free_impl > should allow some of the related nullfs vnodes to be free'd so > the ZFS ones can be free'd and reduce the size of the ARC. > > BTW, I also hacked the kernel and mount to show the vnodes used > per mount ie. mount -v: > test on /test (zfs, NFS exported, local, nfsv4acls, fsid 2b23b2a1de21ed66, > vnodes: count 13846 lazy 0) > /test on /mnt (nullfs, NFS exported, local, nfsv4acls, fsid > 11ff002929000000, vnodes: count 13846 lazy 0) > > Now I can easily see how the vnodes are used without going into ddb. > On my laptop I have various vnet jails and nullfs mount my homedir into > them so pretty much everything goes through nullfs to ZFS. I'm limping > along with the nullfs nocache and small number of vnodes but it would be > nice to not need that. > > Thanks, > > Doug A. > > -- Mateusz Guzik