NFS Mount Hangs
Rick Macklem
rmacklem at uoguelph.ca
Wed Mar 17 21:37:24 UTC 2021
Jason Breitman wrote:
>Please review the details below and let me know if there is a setting that I should >apply to my FreeBSD NFS Server or if there is a bug fix that I can apply to resolve my >issue.
>I shared this information with the linux-nfs mailing list and they believe the issue is >on the server side.
I actually lurk there and saw your post. I'll admit I smiled when Trond argued
that a hung Linux system is the result of a server failing to send a fin/ack for
a closing TCP connection. But, here's a few comments..
>Issue
>NFSv4 mounts periodically hang on the NFS Client.
>
>During this time, it is possible to manually mount from another NFS Server on the >NFS Client having issues.
>Also, other NFS Clients are successfully mounting from the NFS Server in question.
>Rebooting the NFS Client appears to be the only solution.
>
>Environment
>NFS Server
>OS: FreeBSD 12.1-RELEASE-p5
>
>NFS Client
>OS: Debian Buster 10.8
>Kernel: 4.19.171-2
>Protocol: NFSv4 with Kerberos Security
>Mount Options: nfs-server.domain.com:/data /mnt/data nfs4 >lookupcache=pos,noresvport,sec=krb5,hard,rsize=1048576,wsize=1048576 00
The maximum I/O size supported by FreeBSD is 128K.
The client should acquire the attributes that indicate that and set rsize/wsize
to that. "# nfsstat -m" on the client should show you what the client
is actually using. If it is larger than 128K, set both rsize and wsize to 128K.
>Output from the NFS Client when the issue occurs
># netstat -an | grep NFS.Server.IP.X
>tcp 0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049 FIN_WAIT2
I'm no TCP guy. Hopefully others might know why the client would be
stuck in FIN_WAIT2 (I vaguely recall this means it is waiting for a fin/ack,
but could be wrong?)
># cat /sys/kernel/debug/sunrpc/rpc_xprt/*/info
>netid: tcp
>addr: NFS.Server.IP.X
>port: 2049
>state: 0x51
>
>syslog
>Mar 4 10:29:27 hostname kernel: [437414.131978] -pid- flgs status -client- --rqstp- ->timeout ---ops--
>Mar 4 10:29:27 hostname kernel: [437414.133158] 57419 40a1 0 9b723c73 >143cfadf 30000 4ca953b5 nfsv4 OPEN_NOATTR a:call_connect_status [sunrpc] >q:xprt_pending
I don't know what OPEN_NOATTR means, but I assume it is some variant
of NFSv4 Open operation.
[stuff snipped]
>Mar 4 10:29:30 hostname kernel: [437417.110517] RPC: 57419 xprt_connect_status: >connect attempt timed out
>Mar 4 10:29:30 hostname kernel: [437417.112172] RPC: 57419 call_connect_status
>(status -110)
I have no idea what status -110 means?
>Mar 4 10:29:30 hostname kernel: [437417.113337] RPC: 57419 call_timeout (major)
>Mar 4 10:29:30 hostname kernel: [437417.114385] RPC: 57419 call_bind (status 0)
>Mar 4 10:29:30 hostname kernel: [437417.115402] RPC: 57419 call_connect xprt >00000000e061831b is not connected
>Mar 4 10:29:30 hostname kernel: [437417.116547] RPC: 57419 xprt_connect xprt >00000000e061831b is not connected
>Mar 4 10:30:31 hostname kernel: [437478.551090] RPC: 57419 xprt_connect_status: >connect attempt timed out
>Mar 4 10:30:31 hostname kernel: [437478.552396] RPC: 57419 call_connect_status >(status -110)
>Mar 4 10:30:31 hostname kernel: [437478.553417] RPC: 57419 call_timeout (minor)
>Mar 4 10:30:31 hostname kernel: [437478.554327] RPC: 57419 call_bind (status 0)
>Mar 4 10:30:31 hostname kernel: [437478.555220] RPC: 57419 call_connect xprt >00000000e061831b is not connected
>Mar 4 10:30:31 hostname kernel: [437478.556254] RPC: 57419 xprt_connect xprt >00000000e061831b is not connected
Is it possible that the client is trying to (re)connect using the same client port#?
I would normally expect the client to create a new TCP connection using a
different client port# and then retry the outstanding RPCs.
--> Capturing packets when this happens would show us what is going on.
If there is a problem on the FreeBSD end, it is most likely a broken
network device driver.
--> Try disabling TSO , LRO.
--> Try a different driver for the net hardware on the server.
--> Try a different net chip on the server.
If you can capture packets when (not after) the hang
occurs, then you can look at them in wireshark and see
what is actually happening. (Ideally on both client and
server, to check that your network hasn't dropped anything.)
--> I know, if the hangs aren't easily reproducible, this isn't
easily done.
--> Try a newer Linux kernel and see if the problem persists.
The Linux folk will get more interested if you can reproduce
the problem on 5.12. (Recent bakeathon testing of the 5.12
kernel against the FreeBSD server did not find any issues.)
Hopefully the network folk have some insight w.r.t. why
the TCP connection is sitting in FIN_WAIT2.
rick
Jason Breitman
_______________________________________________
freebsd-net at freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
More information about the freebsd-net
mailing list