Re: system stalled, no I/O but 100% CPU from nfs
- Reply: Peter 'PMc' Much: "Re: system stalled, no I/O but 100% CPU from nfs"
- In reply to: Peter 'PMc' Much: "system stalled, no I/O but 100% CPU from nfs"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 06 Jan 2025 13:53:38 UTC
On Sun, Jan 5, 2025 at 8:45 PM Peter 'PMc' Much <pmc@citylink.dinoex.sub.org> wrote: > > Cheers, > > This doesn't look good. It goes on for hours. What can be done about it? > (13.4 client & server) > > > 44 processes: 4 running, 39 sleeping, 1 waiting > CPU: 0.4% user, 0.0% nice, 99.6% system, 0.0% interrupt, 0.0% idle > Mem: 21M Active, 198M Inact, 1190M Wired, 278M Buf, 3356M Free > ARC: 418M Total, 39M MFU, 327M MRU, 128K Anon, 7462K Header, 43M Other > 332M Compressed, 804M Uncompressed, 2.42:1 Ratio > Swap: 15G Total, 15G Free > > PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND > 417 root 4 52 0 12M 2148K RUN 20:55 99.12% nfscbd Do you have delegations enabled on your server (vfs.nfsd.issue_delegations not 0)? (If you do not, I have no idea why the server would be doing callbacks, which is what nfscbd handles.) Also, "nfsstat -m" on the client shows you/us what your mount options are. > 0 root 65 -16 - 0B 1040K swapin 0:17 0.64% kernel > 11054 root 1 52 0 18M 7664K RUN 0:04 0.10% bsdtar > 11 root 15 -56 - 0B 240K WAIT 0:15 0.05% intr > 16 root 1 -16 - 0B 16K - 0:01 0.03% racctd > 11062 root 1 20 0 14M 3804K RUN 0:00 0.03% top > 7 root 3 -16 - 0B 48K psleep 0:00 0.01% pagedaemon > 11056 root 1 20 0 21M 10M select 0:00 0.01% sshd > 6 root 1 -16 - 0B 16K - 0:00 0.01% rand_harvest > > > Interface Traffic Peak Total > vtnet0 in 5.380 KB/s 9.113 KB/s 781.439 MB > out 4.012 KB/s 8.002 KB/s 674.294 MB > > > # nfsstat -zc > /dev/null ; sleep 1 ; nfsstat -c Adding -E makes it show all RPC counts. (Without -E you just get the "old Sun compatible" output. > Rpc Counts: > Getattr Setattr Lookup Readlink Read Write Create Remove > 1 2 5 0 0 0 0 0 > Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access > 0 0 0 0 0 1 0 1 > Mknod Fsstat Fsinfo PathConf Commit > 0 0 0 0 0 > Rpc Info: > TimedOut Invalid X Replies Retries Requests > 0 0 0 0 11 > Cache Info: > Attr Hits Attr Misses Lkup Hits Lkup Misses BioR Hits BioR Misses BioW Hits BioW Misses > 11 1 2 5 0 0 0 0 > BioRL Hits BioRL Misses BioD Hits BioD Misses DirE Hits DirE Misses Accs Hits Accs Misses > 0 0 1 1 1 0 8 1 > > The above suggests that there is still some activity on the client, but the info. is limited. If the client is still in this state, you can collect more info via: # tcpdump -s 0 -w out.pcap host <nfs-server> run for a little while. The out.pcap file needs to be looked at in wireshark (tcpdump is useless at decoding NFS). If there is nothing secret in it, you can email it to me as an attachment, so I can take a look. # ps axHl done repeatedly gets a lot more info about the NFS related threads. (I'll admit I doubt the info is useful for this case?) # nfsstat -E -c -z repeatedly as above. If you just want to get rid of the mount # umount -N <mnt-path> should work, although it can take a couple of minutes. Either not running "nfscbd" on the client or disabling delegations by setting vfs.nfsd.issue_delegations=0 on the server (assuming you have them enabled) ,might/should avoid the problem. rick