NFS, intermittent 'RPC struct is bad' errors
Date: Wed, 19 Jun 2024 06:32:04 UTC
hi, i have a few systems running NFSv4 on FreeBSD, using Kerberos (MIT Kerberos KDC), with the server exporting ZFS filesystems. recently i've noticed intermittent errors of 'RPC struct is bad' when writing to the NFS server, which usually resolves itself after retrying. for example: % rsync -iavP /scratch/Star.Trek.Prodigy.S01E* . sending incremental file list >f++++++++++ Star.Trek.Prodigy.S01E01E02.1080p.WEBRip.x265-KONTRAST.mkv 32,768 0% 0.00kB/s 0:00:00 rsync: [receiver] write failed on "/data/public/TV/Star Trek Prodigy/Season 01/Star.Trek.Prodigy.S01E01E02.1080p.WEBRip.x265-KONTRAST.mkv": RPC struct is bad (72) rsync error: error in file IO (code 11) at receiver.c(380) [receiver=3.3.0] rsync: [sender] write error: Broken pipe (32) % rsync -iavP /scratch/Star.Trek.Prodigy.S01E* . sending incremental file list >f.st....... Star.Trek.Prodigy.S01E01E02.1080p.WEBRip.x265-KONTRAST.mkv 912,704,431 100% 96.51MB/s 0:00:09 (xfr#1, to-chk=18/19) >f++++++++++ Star.Trek.Prodigy.S01E03.1080p.WEBRip.x265-KONTRAST.mkv 477,408,567 100% 100.06MB/s 0:00:04 (xfr#2, to-chk=17/19) [...] the client is running FreeBSD 15.0-CURRENT from around May 24, and the server is running a slightly older 15.0-CURRENT from around May 23. /etc/exports on the server is pretty standard: /data/public -sec=krb5:krb5i:krb5p -network 2001:8b0:aab5::/48 /data/public/Books -sec=krb5:krb5i:krb5p -network 2001:8b0:aab5::/48 /data/public/CalibreLibrary -sec=krb5:krb5i:krb5p -network 2001:8b0:aab5::/48 /data/public/Comics -sec=krb5:krb5i:krb5p -network 2001:8b0:aab5::/48 /data/public/Films -sec=krb5:krb5i:krb5p -network 2001:8b0:aab5::/48 /data/public/Miscellaneous -sec=krb5:krb5i:krb5p -network 2001:8b0:aab5::/48 V4: /data -sec=sys:krb5:krb5i:krb5p -network 2001:8b0:aab5::/48 client mount options: hemlock.eden.le-fay.org:/public /data/public nfs rw,nfsv4,minorversion=2,sec=krb5p,gssname=host,bgnow,proto=tcp6,nconnect=4,rsize=1048576,wsize=1048576,noncontigwr 0 0 is there anything more i can do investigate this? would a tcpdump capture of the error be useful (considering all the RPC traffic is Kerberos-encrypted)?