Re: Speed improvements in ZFS
- Reply: Alexander Leidinger : "Re: Speed improvements in ZFS"
- In reply to: Mateusz Guzik : "Re: Speed improvements in ZFS"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 28 Aug 2023 20:33:48 UTC
Am 2023-08-22 18:59, schrieb Mateusz Guzik: > On 8/22/23, Alexander Leidinger <Alexander@leidinger.net> wrote: >> Am 2023-08-21 10:53, schrieb Konstantin Belousov: >>> On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote: >>>> Am 2023-08-20 23:17, schrieb Konstantin Belousov: >>>> > On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote: >>>> > > On 8/20/23, Alexander Leidinger <Alexander@leidinger.net> wrote: >>>> > > > Am 2023-08-20 22:02, schrieb Mateusz Guzik: >>>> > > >> On 8/20/23, Alexander Leidinger <Alexander@leidinger.net> wrote: >>>> > > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik: >>>> > > >>>> On 8/18/23, Alexander Leidinger <Alexander@leidinger.net> >>>> > > >>>> wrote: >>>> > > >>> >>>> > > >>>>> I have a 51MB text file, compressed to about 1MB. Are you >>>> > > >>>>> interested >>>> > > >>>>> to >>>> > > >>>>> get it? >>>> > > >>>>> >>>> > > >>>> >>>> > > >>>> Your problem is not the vnode limit, but nullfs. >>>> > > >>>> >>>> > > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg >>>> > > >>> >>>> > > >>> 122 nullfs mounts on this system. And every jail I setup has >>>> > > >>> several >>>> > > >>> null mounts. One basesystem mounted into every jail, and then >>>> > > >>> shared >>>> > > >>> ports (packages/distfiles/ccache) across all of them. >>>> > > >>> >>>> > > >>>> First, some of the contention is notorious VI_LOCK in order to >>>> > > >>>> do >>>> > > >>>> anything. >>>> > > >>>> >>>> > > >>>> But more importantly the mind-boggling off-cpu time comes from >>>> > > >>>> exclusive locking which should not be there to begin with -- as >>>> > > >>>> in >>>> > > >>>> that xlock in stat should be a slock. >>>> > > >>>> >>>> > > >>>> Maybe I'm going to look into it later. >>>> > > >>> >>>> > > >>> That would be fantastic. >>>> > > >>> >>>> > > >> >>>> > > >> I did a quick test, things are shared locked as expected. >>>> > > >> >>>> > > >> However, I found the following: >>>> > > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) { >>>> > > >> mp->mnt_kern_flag |= >>>> > > >> lowerrootvp->v_mount->mnt_kern_flag & >>>> > > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | >>>> > > >> MNTK_EXTENDED_SHARED); >>>> > > >> } >>>> > > >> >>>> > > >> are you using the "nocache" option? it has a side effect of >>>> > > >> xlocking >>>> > > > >>>> > > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache. >>>> > > > >>>> > > >>>> > > If you don't have "nocache" on null mounts, then I don't see how >>>> > > this >>>> > > could happen. >>>> > >>>> > There is also MNTK_NULL_NOCACHE on lower fs, which is currently set >>>> > for >>>> > fuse and nfs at least. >>>> >>>> 11 of those 122 nullfs mounts are ZFS datasets which are also NFS >>>> exported. >>>> 6 of those nullfs mounts are also exported via Samba. The NFS >>>> exports >>>> shouldn't be needed anymore, I will remove them. >>> By nfs I meant nfs client, not nfs exports. >> >> No NFS client mounts anywhere on this system. So where is this >> exclusive >> lock coming from then... >> This is a ZFS system. 2 pools: one for the root, one for anything I >> need >> space for. Both pools reside on the same disks. The root pool is a >> 3-way >> mirror, the "space-pool" is a 5-disk raidz2. All jails are on the >> space-pool. The jails are all basejail-style jails. >> > > While I don't see why xlocking happens, you should be able to dtrace > or printf your way into finding out. dtrace looks to me like a faster approach to get to the root than printf... my first naive try is to detect exclusive locks. I'm not 100% sure I got it right, but at least dtrace doesn't complain about it: ---snip--- #pragma D option dynvarsize=32m fbt:nullfs:null_lock:entry /args[0]->a_flags & 0x080000 != 0/ { stack(); } ---snip--- In which direction should I look with dtrace if this works in tonights run of periodic? I don't have enough knowledge about VFS to come up with some immediate ideas. Bye, Alexander. -- http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF