[Bug 266236] ZFS NFS : .zfs/snapshot : Stale file handle

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 05 Sep 2022 15:36:57 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=266236

            Bug ID: 266236
           Summary: ZFS NFS : .zfs/snapshot : Stale file handle
           Product: Base System
           Version: 13.1-STABLE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: standards
          Assignee: standards@FreeBSD.org
          Reporter: nomad@neuronfarm.net

Hi, since upgrading to 13.1-RELEASE of FreeBSD I can't anymore access to
.zfs/snapshot folder over NFS.

On Ubuntu or Debian client when I tried to acces do .zfs/snapshot I obtain :
Stale file handle

medic:/home/user1 on /home/user1 type nfs
(rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.0.80,mountvers=3,mountport=850,mountproto=udp,local_lock=none,addr=192.168.0.80)

    I have 2 server one in 13.0-p7 and the other in 13.1-p2
    Several disk bay, all in multi-attachment
    each bay connected to the 2 server

I use this setup since 12.0 Release and before 13.1 all was ok with snapshot
access.

I use carp to be able to distribute the load over my two server and in case of
trouble or upgrade needed I can import all my pool in one and then upgrade on
the other.

So I have several IP for this data service, one by pool export in fact.

I only have the stale file handle on .zfs/snapshot over NFS on the 13.1 server,
if I import my pool on the 13.0 it works has normal.

Locally (On FreeBSD) I can list the snapshots normally on booth server.

I have to upgrade my booth server to 13.1 because with 13.0 I was facing an
other trouble which is solver under 13.1.

Has I say NFS setup is based on a carp IP :

lagg1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu
1500
options=4e507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
[...]
        inet 192.168.0.80 netmask 0xffffff00 broadcast 192.168.0.255 vhid 80
[...]
        laggproto lacp lagghash l2,l3,l4
        laggport: bnxt0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: bnxt1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>                
        groups: lagg                                                            
        carp: MASTER vhid 80 advbase 1 advskew 100                              
[...]                                                                           
        media: Ethernet autoselect                                              
        status: active                                                          
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>


My NFS config :

rpcbind_enable="YES"
nfs_server_enable="YES"
nfs_server_flags="-u -t -h 192.168.0.80 -h 192.168.0.81 -h 192.168.0.82 -h
192.168.0.83 --minthreads 12 --maxthreads 24"
mountd_enable="YES"
rpc_lockd_enable="YES"
rpc_statd_enable="YES"


My sharenfs setup on the pool/vol :

# zfs get sharenfs tank/home/user1
NAME              PROPERTY  VALUE                                  SOURCE
tank/home/user1  sharenfs  -network 192.168.0.0 -mask 255.255.255.0  local

It's seems there is the same trouble with TrueNas 13 see here :Forum TrueNAS -
Stale file handle" when list snapshots (.zfs)

An other issu which also appear on TrueNAS :
Deleting a snapshot in which a simple "ls" via NFS has been attempted will
completely block and leave the zfs destroy process in an unkillable state IO
(trouble).
On TrueNAS it seems that in this case whole system will become unstable or even
totally unusable...

If anyone can help.
Thanks.

-- 
You are receiving this mail because:
You are the assignee for the bug.