Re: nfs stalls client: nfsrv_cache_session: no session

From: Rick Macklem <rmacklem_at_uoguelph.ca>
Date: Sun, 28 Aug 2022 19:20:35 UTC
Also, if you have multiple clients, make sure that they
all have unique /etc/hostid's. A duplicate machine with
the same /etc/hostid as another one will screw up NFSv4
really badly.

rick

________________________________________
From: owner-freebsd-stable@freebsd.org <owner-freebsd-stable@freebsd.org> on behalf of Peter <pmc@citylink.dinoex.sub.org>
Sent: Saturday, July 16, 2022 8:06 AM
To: freebsd-stable@freebsd.org
Subject: nfs stalls client: nfsrv_cache_session: no session

CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp@uoguelph.ca


Hija,
  I have a problem with NFSv4:

The configuration:
  Server Rel. 13.1-RC2
    nfs_server_enable="YES"
    nfs_server_flags="-u -t --minthreads 2 --maxthreads 20 -h ..."
    mountd_enable="YES"
    mountd_flags="-S -p 803 -h ..."
    rpc_lockd_enable="YES"
    rpc_lockd_flags="-h ..."
    rpc_statd_enable="YES"
    rpc_statd_flags="-h ..."
    rpcbind_enable="YES"
    rpcbind_flags="-h ..."
    nfsv4_server_enable="YES"
    sysctl vfs.nfs.enable_uidtostring=1
    sysctl vfs.nfsd.enable_stringtouid=1

  Client bhyve Rel. 13.1-RELEASE on the same system
    nfs_client_enable="YES"
    nfs_access_cache="600"
    nfs_bufpackets="32"
    nfscbd_enable="YES"

  Mount-options: nfsv4,readahead=1,rw,async


Access to the share suddenly stalled. Server reports this in messages,
every second:
   nfsrv_cache_session: no session IPaddr=192.168...

Restarting nfsd and mountd didn't help, only now the client started to
also report in messages, every second:
   nfs server 192.168...:/var/sysup/mnt/tmp.6.56160: is alive again

Mounting the same share anew to a different place works fine.

The network babble is this, every second:
   NFS request xid 1678997001 212 getattr fh 0,6/2
   NFS reply xid 1678997001 reply ok 52 getattr ERROR: unk 10052

Forensics: I tried to build openoffice on that share, a couple of
   times. So there was a bit of traffic, and some things may have
   overflown.

There seems to be no way to recover, only crashing the client.