Major issues with nfsv4

Fri Dec 11 01:00:03 UTC 2020

J. David wrote:
>Recently, we attempted to get with the 2000's and try switching from
>NFSv3 to NFSv4 on our 12.2 servers.  This has not gone well.
>
>Any system we switch to NFSv4 mounts is functionally unusable, pegged
>at 100% system CPU usage, load average 70+, largely from nfscl threads
>and client processes using NFS.
>
>Dmesg shows NFS-related messages:
>
>$ dmesg | fgrep -i nfs | sort | uniq -c | sort -n
>   1 nfsv4 err=10010
>   4 nfsv4 client/server protocol prob err=10026
>  29 nfscl: never fnd open
Add "minorversion=1" to your FreeBSD NFS client mount options
and error 10026 should go away (and I suspect that the 10010 will
go away too.

The correct semantics for handling the "seqid" field that
serialized open/lock operations for NFSv4.0 is difficult to get
correct (and might now be broken in the client, since the
original code written 20years ago depended on exclusive
vnode locking and hasn't been updated or interop tested with
non-FreeBSD NFS servers for ages).
--> NFSv4.0 is close to 20years old and has been fixed/superceded
      by NFSv4.1 for many years now.
--> NFSv4.1 (and NFSv4.2) replaced the "seqid" stuff with something
      called "sessions", which works better.

I have been tempted to make FreeBSD NFSv4 mounts use 4.1/4.2
by default to avoid problems with NFSv4.0, but I've hesitated since
the change could be considered a POLA violation.

NFSv4.0 is like any .0 release. There were significant issues with the
protocol fixed by NFSv4.1.

If you still have problems when using NFSv4.1, post again.
Btw, "nfsstat -m" shows what the client mount options actually are.

rick

Nfsstat shows no client activity; "nfsstat -e -c 1" and "nfsstat -c 1"
both report:

 GtAttr Lookup Rdlink   Read  Write Rename Access  Rddir
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0

Meanwhile, tcpdump on the client shows an endless stream of getattr
requests at the exact same time nfsstat -c says nothing is happening:

$ sudo tcpdump -n -i net1 -c 10 port 2049 and src 172.20.200.39
14:47:27.037974 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [.],
ack 72561, win 545, options [nop,nop,TS val 234259249 ecr 4155804100],
length 0
14:47:27.046282 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 139940:140092, ack 72561, win 545, options [nop,nop,TS val
234259259 ecr 4155804100], length 152: NFS request xid 1544756021 148
getattr fh 0,5/0
14:47:27.051260 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140092:140248, ack 72641, win 545, options [nop,nop,TS val
234259269 ecr 4155804104], length 156: NFS request xid 1544756022 152
getattr fh 0,5/0
14:47:27.063372 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140248:140404, ack 72721, win 545, options [nop,nop,TS val
234259279 ecr 4155804106], length 156: NFS request xid 1544756023 152
getattr fh 0,5/0
14:47:27.068646 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140404:140556, ack 72801, win 545, options [nop,nop,TS val
234259279 ecr 4155804108], length 152: NFS request xid 1544756024 148
getattr fh 0,5/0
14:47:27.080627 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140556:140712, ack 72881, win 545, options [nop,nop,TS val
234259299 ecr 4155804110], length 156: NFS request xid 1544756025 152
getattr fh 0,5/0
14:47:27.085224 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140712:140868, ack 72961, win 545, options [nop,nop,TS val
234259299 ecr 4155804112], length 156: NFS request xid 1544756026 152
getattr fh 0,5/0
14:47:27.096802 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140868:141024, ack 73041, win 545, options [nop,nop,TS val
234259309 ecr 4155804114], length 156: NFS request xid 1544756027 152
getattr fh 0,5/0
14:47:27.101849 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 141024:141180, ack 73121, win 545, options [nop,nop,TS val
234259319 ecr 4155804116], length 156: NFS request xid 1544756028 152
getattr fh 0,5/0
14:47:27.112905 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 141180:141336, ack 73201, win 545, options [nop,nop,TS val
234259329 ecr 4155804118], length 156: NFS request xid 1544756029 152
getattr fh 0,5/0

Only 10 shown here for brevity, but:

$ sudo tcpdump -n -i net1 -c 10000 port 2049 and src 172.20.200.39 |
fgrep getattr | wc -l
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on net1, link-type EN10MB (Ethernet), capture size 262144 bytes
10000 packets captured
20060 packets received by filter
0 packets dropped by kernel
    9759

There are no dropped packets or network problems:

$ netstat -in -I net1
Name    Mtu Network       Address              Ipkts Ierrs Idrop
Opkts Oerrs  Coll
net1   1500 <Link#2>      12:33:df:5f:79:d7 40988832     0     0
48760307     0     0
net1      - 172.20.0.0/16 172.20.200.39     40942065     -     -
48756241     -     -

The mount flags in fstab are:

ro,nfsv4,nosuid

The mount flags as reported by "nfsstat -m" are:

nfsv4,minorversion=0,tcp,resvport,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=2147483647

Today, I managed to kill everything down to one user process that was
exhibiting this behavior.  After a kill -9 on that process, it went to
"REsJ" but continued to burn the same amount of CPU (all system).
Oddly the run state / wait channel was just "CPU1."  Running "ktrace"
did not produce any trace records.  Probably that is predictable for a
process in E state; if the process had crossed the user/kernel
boundary in a way ktrace could detect, it would have exited.

At that point, I started unmounting filesystems.  Everything but the
NFS filesystem used by that process unmounted cleanly.  The umount for
that filesystem went to D state for about a minute and then kicked
back "Device busy."  That's fair, if awfully slow.

Meanwhile, that user process continued burning system CPU with the E
flag set, not doing anything whatsoever in userspace, still producing
300+ "getattr fh 0,5/0" per second according to tcpdump and 0
according to nfsstat.

Eventually, I rebooted with fstab set back to nfsv3.

This feels like the user process is in a system call that is stuck in
an endless loop repeating some operation that generates that getattr
request.  But that is a feeling, not a fact.

This is fairly easy to reproduce; it seems pretty consistent within a
few hours (a day at most) any time I switch the relevant mounts to
nfsv4.  Reverting to nfsv3 makes this issue completely disappear.

What on earth could be going on here?  What other information can I
provide that would help track this down?

Thanks for any advice!
_______________________________________________
freebsd-fs at freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"