Re: git: 6468cd8e0ef9 - main - mount: add vnode usage per file system with mount -v
- In reply to: Cy Schubert : "Re: git: 6468cd8e0ef9 - main - mount: add vnode usage per file system with mount -v"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 15 Jun 2022 16:11:21 UTC
On Wed, Jun 15, 2022 at 09:09:12AM -0700, Cy Schubert wrote: | Thanks. This fixes it. 16 simultaneous tar cf /dev/null /usr/obj on three | separate NFS clients: | | Server: | Clients OpenOwner Opens LockOwner Locks | Delegs | 3 48 54 0 0 | 0 | Layouts | 0 | | The load on the NFS server is ~ 0.5 to 2 on a four core machine with CPU | sys busy between 2% and 30%. Prior to this CPU sys was pegged at close to | 100% with a load of 18. The regression has been resolved. Thanks. Thanks for the confirmation. Seems like statfs is used by a lot of things. Doug A. | In message <Yqn1yOpENwgc588g@ambrisko.com>, Doug Ambrisko writes: | > On Wed, Jun 15, 2022 at 07:23:51AM -0700, Doug Ambrisko wrote: | > | On Wed, Jun 15, 2022 at 07:05:14AM -0700, Cy Schubert wrote: | > | | Can we revert this, please. It breaks NFSv4. | > | | > | It would be nice if you could try the proposed partial revert. | > | I'm planning to commit that shortly. | > | > It is in ce00b11940ab. | > | > Please let me know how that works. | > | > Thanks, | > | > Doug A. | > | | In message <20220615030833.79F9A9B@slippy.cwsent.com>, Cy Schubert writes | > : | > | | > In message <CAGudoHHbf3PTt_TXnKAtt5n_WOYpvA0dCb9HAiDLAL+yR3VCFQ@mail.gm | > ail.c | > | | > om> | > | | > , Mateusz Guzik writes: | > | | > > On 6/13/22, Doug Ambrisko <ambrisko@ambrisko.com> wrote: | > | | > > > On Mon, Jun 13, 2022 at 06:43:31PM +0200, Mateusz Guzik wrote: | > | | > > > | On 6/13/22, Doug Ambrisko <ambrisko@freebsd.org> wrote: | > | | > > > | > The branch main has been updated by ambrisko: | > | | > > > | > | > | | > > > | > URL: | > | | > > > | > | > | | > > > https://cgit.FreeBSD.org/src/commit/?id=6468cd8e0ef9d1d3331e9de26cd | > 2be59b | > | | > c7 | > | | > > 78494 | > | | > > > | > | > | | > > > | > commit 6468cd8e0ef9d1d3331e9de26cd2be59bc778494 | > | | > > > | > Author: Doug Ambrisko <ambrisko@FreeBSD.org> | > | | > > > | > AuthorDate: 2022-06-13 14:56:38 +0000 | > | | > > > | > Commit: Doug Ambrisko <ambrisko@FreeBSD.org> | > | | > > > | > CommitDate: 2022-06-13 14:56:38 +0000 | > | | > > > | > | > | | > > > | > mount: add vnode usage per file system with mount -v | > | | > > > | > | > | | > > > | > This avoids the need to drop into the ddb to figure out vno | > de | > | | > > > | > usage per file system. It helps to see if they are or are | > not | > | | > > > | > being freed. Suggestion to report active vnode count was f | > rom | > | | > > > | > kib@ | > | | > > > | > | > | | > > > | > Reviewed by: kib | > | | > > > | > Differential Revision: https://reviews.freebsd.org/D35436 | > | | > > > | > --- | > | | > > > | > sbin/mount/mount.c | 7 +++++++ | > | | > > > | > sys/kern/vfs_mount.c | 12 ++++++++++++ | > | | > > > | > sys/sys/mount.h | 4 +++- | > | | > > > | > 3 files changed, 22 insertions(+), 1 deletion(-) | > | | > > > | > | > | | > > > | > diff --git a/sbin/mount/mount.c b/sbin/mount/mount.c | > | | > > > | > index 79d9d6cb0caf..bd3d0073c474 100644 | > | | > > > | > --- a/sbin/mount/mount.c | > | | > > > | > +++ b/sbin/mount/mount.c | > | | > > > | > @@ -692,6 +692,13 @@ prmount(struct statfs *sfp) | > | | > > > | > xo_emit("{D:, }{Lw:fsid}{:fsid}", fsidb | > uf); | > | | > > > | > free(fsidbuf); | > | | > > > | > } | > | | > > > | > + if (sfp->f_nvnodelistsize != 0 || sfp->f_avnode | > count != | > | | > > 0) { | > | | > > > | > + xo_open_container("vnodes"); | > | | > > > | > + xo_emit("{D:, | > | | > > > | > }{Lwc:vnodes}{Lw:count}{w:count/%ju}{Lw:active}{:active/%ju}", | > | | > > > | > + (uintmax_t)sfp->f_nvnodelistsize, | > | | > > > | > + (uintmax_t)sfp->f_avnodecount); | > | | > > > | > + xo_close_container("vnodes"); | > | | > > > | > + } | > | | > > > | > } | > | | > > > | > xo_emit("{D:)}\n"); | > | | > > > | > } | > | | > > > | > diff --git a/sys/kern/vfs_mount.c b/sys/kern/vfs_mount.c | > | | > > > | > index 71a40fd97a9c..e3818b67e841 100644 | > | | > > > | > --- a/sys/kern/vfs_mount.c | > | | > > > | > +++ b/sys/kern/vfs_mount.c | > | | > > > | > @@ -2610,6 +2610,8 @@ vfs_copyopt(struct vfsoptlist *opts, cons | > t char | > | | > > > *name, | > | | > > > | > void *dest, int len) | > | | > > > | > int | > | | > > > | > __vfs_statfs(struct mount *mp, struct statfs *sbp) | > | | > > > | > { | > | | > > > | > + struct vnode *vp; | > | | > > > | > + uint32_t count; | > | | > > > | > | > | | > > > | > /* | > | | > > > | > * Filesystems only fill in part of the structure for u | > pdates, | > | | > > we | > | | > > > | > @@ -2624,6 +2626,16 @@ __vfs_statfs(struct mount *mp, struct st | > atfs | > | | > > > *sbp) | > | | > > > | > sbp->f_version = STATFS_VERSION; | > | | > > > | > sbp->f_namemax = NAME_MAX; | > | | > > > | > sbp->f_flags = mp->mnt_flag & MNT_VISFLAGMASK; | > | | > > > | > + sbp->f_nvnodelistsize = mp->mnt_nvnodelistsize; | > | | > > > | > + | > | | > > > | > + count = 0; | > | | > > > | > + MNT_ILOCK(mp); | > | | > > > | > + TAILQ_FOREACH(vp, &mp->mnt_nvnodelist, v_nmntvnodes) { | > | | > > > | > + if (vrefcnt(vp) > 0) /* racy but does not matte | > r */ | > | | > > > | > + count++; | > | | > > > | > + } | > | | > > > | > + MNT_IUNLOCK(mp); | > | | > > > | > + sbp->f_avnodecount = count; | > | | > > > | > | > | | > > > | | > | | > > > | libc uses statfs for dir walk (see gen/fts.c), most notably find | > | | > > > | immediately runs into it. As such the linear scan by default is a | > | | > > > | non-starter. | > | | > > > | | > | | > > > | I don't know if mount is the right place to dump this kind of inf | > o to | > | | > > > | begin with, but even so, it should only happen with a dedicated f | > lag. | > | | > > > | | > | | > > > | As statfs does not take any flags on its own, there is no way to | > | | > > > | prevent it from doing the above walk. Perhaps a dedicated sysctl | > which | > | | > > > | takes mount point id could do the walk instead, when asked. | > | | > > > | | > | | > > > | Short of making the walk optional I'm afraid this will have to be | > | | > > > reverted. | > | | > > > | > | | > > > Just to be clear, this isn't breaking things but is not optimal for | > | | > > > things that don't need this extra info. | > | | > > > | > | | > > | > | | > > It's not "not optimal", it's a significant overhead which taxes | > | | > > frequent users which don't benefit from it. | > | | > | > | | > Indeed this is not optimal. Since this revision NFSv4 performance has | > | | > tanked. An installworld over NFSv4 which used to take approximately 15 | > | | > minutes took all night, not even finishing by morning. The NFS server, | > a 4 | > | | > core machine, had a load average of 18 with three NFS clients attemptin | > g | > | | > installworld with nfsd using over 90% of the cycles on the machine. I w | > as | > | | > able to reproduce the problem by running a series of tar cf /dev/null | > | | > /usr/obj in parallel, on a single NFSv4 client to essentially DoS the N | > FSv4 | > | | > server. | > | | > | > | | > The workaround was to fall back to NFSv3, which was unaffected by this | > | | > revision. | > | | > | > | | > I reached out to our resident NFS person (rmacklem@) who suggested | > | | > reverting this revision, restoring the network to pre-regression state. | > | | > | > | | > > | > | | > > For more data I plugged dtrace -n 'fbt::__vfs_statfs:entry { | > | | > > @[execname] = count(); }' while package building, then i got tons of | > | | > > hits: | > | | > > [snip] | > | | > > expr 13992 | > | | > > install 14090 | > | | > > dirname 14921 | > | | > > mv 17404 | > | | > > ghc-stage1 17577 | > | | > > grep 18998 | > | | > > xgcc 23832 | > | | > > cpp 29282 | > | | > > cc1 36961 | > | | > > sh 70575 | > | | > > rm 73904 | > | | > > ld.lld 87784 | > | | > > sed 88803 | > | | > > c++ 98175 | > | | > > cat 115811 | > | | > > cc 449725 | > | | > > | > | | > [...] | > | | > > -- | > | | > > Mateusz Guzik <mjguzik gmail.com> | > | | > > | > | | > | > | | > | > | | > -- | > | | > Cheers, | > | | > Cy Schubert <Cy.Schubert@cschubert.com> | > | | > FreeBSD UNIX: <cy@FreeBSD.org> Web: http://www.FreeBSD.org | > | | > NTP: <cy@nwtime.org> Web: https://nwtime.org | > | | > | > | | > e**(i*pi)+1=0 | > | | > | > | | > | > | | |