Re: nfsd becomes slow when machine CPU usage is at or over 100% on STABLE/13
Date: Wed, 09 Mar 2022 14:39:39 UTC
Yoshihiro Ota <ota@j.email.ne.jp> wrote: > Hi, > > I'm on stable/13 with latest code base. > I started testing pre-13.1 branch. > > I noticed major performance degrades with NFS when all CPUs are fully > utilized. > > This happends with stable/13 but not releng/13.0 nor releng/12.3. NFS performance is sensitive to RPC response time. Since this only happens when the COUs are busy, I'd suspect: - Kernel thread scheduling changes or - Timing of receive socket upcalls (which wake up the nfsd kernel threads). I suspect bisecting to the actual commit that causes this is the only way to find it. If you know of a working stable/13 that is more recent than 13.0, it would help. If not, you start at this commit (which did make socket upcall changes): commit 55cc0a478506ee1c2db7b2f9aadb9855e5490af3 which was done on May 21, 2021. Maybe others can suggest commits related to thread scheduling (which I know nothing about). If you don't have the time/resources to bisect, I doubt this will get resolved. Good luck with it, rick I had NFS server with above versions and rsynced nfs mount to ufs mount on NFS clients. My NFS server has 4 cores. When I had load average of 3 with make buildworld -j3, NFS server was fine. After adding another 1 load, NFS server throughput came down to about 10% of before. After taking back to 3 load avg, performance recovered and down again after getting over 4. Disk was fully avaiable for rsync; buildworld was done on another disk. Someone told me his smbfs was also slow and he suspected TCP/IP regression instead of NFS, by the way. Hiro