Re: NFS 4.1 RECLAIM_COMPLETE FS failed error in combination with ESXi client

Thu Mar 8 22:54:21 UTC 2018

NAGY Andreas wrote:
>Thanks you, really great how fast you adapt the source/make patches for this. Saw so many >posts were people did not get NFS41 working with ESXi and FreeBSD and now I have it already >running with your changes.
>
>I have now compiled the kernel with all 4 patches, and it works now.
Ok. Sounds like we are making progress. It also takes someone willing to test patches, so
thanks for doing so.
>Some problems are still left:
>
>- the "Server returned improper reason for no delegation: 2" warnings are still in the >vmkernel.log.
>                2018-03-08T11:41:20.290Z cpu0:68011 opID=488969b0)WARNING: NFS41: >NFS41ValidateDelegation:608: Server returned improper reason for no delegation: 2
I'll take another look and see if I can guess why it doesn't like "2" as a reason for not
issuing a delegation. (As noted before, I don't think this is serious, but???)

>- can't delete a folder with the VMware host client datastore browser:
 >               2018-03-08T11:34:00.349Z cpu1:67981 opID=f5159ce3)WARNING: NFS41: >NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 0x43046e4cb158: Transient >file system condition, suggest retry
[more of these snipped]
>                2018-03-08T11:34:00.352Z cpu1:67981 opID=f5159ce3)WARNING: UserFile: 2155: >hostd-worker: Directory changing too often to perform readdir operation (11 retries), >returning busy
This one is a mystery to me. It seemed to be upset that the directory is changing (I
assume either the Change or ModifyTime attributes). However, if entries are being
deleted, the directory is changing and, as far as I know, the Change and ModifyTime
attributes are supposed to change.
I might try posting on nfsv4 at ietf.org in case somebody involved with this client reads
that list and can explain what this is?

>- after a reboot of the FreeBSD machine the ESXi does not restore the NFS datastore again >with following warning (just disconnecting the links is fine)
>                2018-03-08T12:39:44.602Z cpu23:66484)WARNING: NFS41: NFS41_Bug:2361: BUG - >Invalid BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPP
Hmm. Normally after a server reboot, the clients will try some RPC that starts with a
Sequence (the session op) and the server will reply NFS4ERR_BAD_SESSION.
This triggers recovery in the client.
The BindConnectiontoSession operation is done in an RPC by itself, so there is no
Sequence op to trigger NFS4ERR_BAD_SESSION.
Maybe this client expects to see NFS4ERR_BAD_SESSION for the BindConnectiontoSession.
I'll post a patch that modifies the BindConnectiontoSession to do that.

>Actually I have only made some quick benchmarks with ATTO in a Windows VM which has a >vmdk on the NFS41 datastore which is mounted over two 1GB links in different subnets.
>Read is nearly the double of just a single connection and write is just a bit faster. Don't know if >write speed could be improved, actually the share is UFS on a HW raid controller which has >local write speeds about 500MB/s.
Yes, before I posted that I didn't understand why multiple TCP links would be faster.
I didn't notice at the time that you mentioned using different subnets and, as such,
links couldn't be trunked below TCP. In your case trunking above TCP makes sense.

Getting slower write rates than read rates from NFS is normal.
Did you try "sysctl vfs.nfsd.async=1"?
The other thing that might help for UFS is increasing the size of the buffer cache.
(If this server is mainly an NFS server you could probably make the buffer cache
 greater than half of the machine's ram.
 Note to others, since ZFS doesn't use the buffer cache, the opposite is true for
 ZFS.)

rick