Re: Speed improvements in ZFS

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Mon, 21 Aug 2023 08:53:48 UTC
On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote:
> Am 2023-08-20 23:17, schrieb Konstantin Belousov:
> > On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote:
> > > On 8/20/23, Alexander Leidinger <Alexander@leidinger.net> wrote:
> > > > Am 2023-08-20 22:02, schrieb Mateusz Guzik:
> > > >> On 8/20/23, Alexander Leidinger <Alexander@leidinger.net> wrote:
> > > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
> > > >>>> On 8/18/23, Alexander Leidinger <Alexander@leidinger.net> wrote:
> > > >>>
> > > >>>>> I have a 51MB text file, compressed to about 1MB. Are you interested
> > > >>>>> to
> > > >>>>> get it?
> > > >>>>>
> > > >>>>
> > > >>>> Your problem is not the vnode limit, but nullfs.
> > > >>>>
> > > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg
> > > >>>
> > > >>> 122 nullfs mounts on this system. And every jail I setup has several
> > > >>> null mounts. One basesystem mounted into every jail, and then shared
> > > >>> ports (packages/distfiles/ccache) across all of them.
> > > >>>
> > > >>>> First, some of the contention is notorious VI_LOCK in order to do
> > > >>>> anything.
> > > >>>>
> > > >>>> But more importantly the mind-boggling off-cpu time comes from
> > > >>>> exclusive locking which should not be there to begin with -- as in
> > > >>>> that xlock in stat should be a slock.
> > > >>>>
> > > >>>> Maybe I'm going to look into it later.
> > > >>>
> > > >>> That would be fantastic.
> > > >>>
> > > >>
> > > >> I did a quick test, things are shared locked as expected.
> > > >>
> > > >> However, I found the following:
> > > >>         if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
> > > >>                 mp->mnt_kern_flag |=
> > > >> lowerrootvp->v_mount->mnt_kern_flag &
> > > >>                     (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
> > > >>                     MNTK_EXTENDED_SHARED);
> > > >>         }
> > > >>
> > > >> are you using the "nocache" option? it has a side effect of xlocking
> > > >
> > > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.
> > > >
> > > 
> > > If you don't have "nocache" on null mounts, then I don't see how this
> > > could happen.
> > 
> > There is also MNTK_NULL_NOCACHE on lower fs, which is currently set for
> > fuse and nfs at least.
> 
> 11 of those 122 nullfs mounts are ZFS datasets which are also NFS exported.
> 6 of those nullfs mounts are also exported via Samba. The NFS exports
> shouldn't be needed anymore, I will remove them.
By nfs I meant nfs client, not nfs exports.

> 
> Shouldn't this implicit nocache propagate to the mount of the upper fs to
> give the user feedback about the effective state?
> 
> Bye,
> Alexander.
> 
> -- 
> http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
> http://www.FreeBSD.org    netchild@FreeBSD.org  : PGP 0x8F31830F9F2772BF