Re: NFS, intermittent 'RPC struct is bad' errors

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Wed, 19 Jun 2024 14:21:25 UTC
On Tue, Jun 18, 2024 at 11:32 PM Lexi Winter <lexi@le-fay.org> wrote:
>
> hi,
>
> i have a few systems running NFSv4 on FreeBSD, using Kerberos (MIT
> Kerberos KDC), with the server exporting ZFS filesystems.
>
> recently i've noticed intermittent errors of 'RPC struct is bad' when
> writing to the NFS server, which usually resolves itself after retrying.
> for example:
>
> % rsync -iavP /scratch/Star.Trek.Prodigy.S01E* .
> sending incremental file list
> >f++++++++++ Star.Trek.Prodigy.S01E01E02.1080p.WEBRip.x265-KONTRAST.mkv
>          32,768   0%    0.00kB/s    0:00:00  rsync: [receiver] write failed on "/data/public/TV/Star Trek Prodigy/Season 01/Star.Trek.Prodigy.S01E01E02.1080p.WEBRip.x265-KONTRAST.mkv": RPC struct is bad (72)
> rsync error: error in file IO (code 11) at receiver.c(380) [receiver=3.3.0]
>
> rsync: [sender] write error: Broken pipe (32)
> % rsync -iavP /scratch/Star.Trek.Prodigy.S01E* .
> sending incremental file list
> >f.st....... Star.Trek.Prodigy.S01E01E02.1080p.WEBRip.x265-KONTRAST.mkv
>     912,704,431 100%   96.51MB/s    0:00:09 (xfr#1, to-chk=18/19)
> >f++++++++++ Star.Trek.Prodigy.S01E03.1080p.WEBRip.x265-KONTRAST.mkv
>     477,408,567 100%  100.06MB/s    0:00:04 (xfr#2, to-chk=17/19)
> [...]
>
> the client is running FreeBSD 15.0-CURRENT from around May 24, and the
> server is running a slightly older 15.0-CURRENT from around May 23.
>
> /etc/exports on the server is pretty standard:
>
> /data/public                    -sec=krb5:krb5i:krb5p   -network 2001:8b0:aab5::/48
> /data/public/Books              -sec=krb5:krb5i:krb5p   -network 2001:8b0:aab5::/48
> /data/public/CalibreLibrary     -sec=krb5:krb5i:krb5p   -network 2001:8b0:aab5::/48
> /data/public/Comics             -sec=krb5:krb5i:krb5p   -network 2001:8b0:aab5::/48
> /data/public/Films              -sec=krb5:krb5i:krb5p   -network 2001:8b0:aab5::/48
> /data/public/Miscellaneous      -sec=krb5:krb5i:krb5p   -network 2001:8b0:aab5::/48
> V4: /data                       -sec=sys:krb5:krb5i:krb5p       -network 2001:8b0:aab5::/48
>
> client mount options:
>
> hemlock.eden.le-fay.org:/public /data/public    nfs     rw,nfsv4,minorversion=2,sec=krb5p,gssname=host,bgnow,proto=tcp6,nconnect=4,rsize=1048576,wsize=1048576,noncontigwr      0 0
>
> is there anything more i can do investigate this?  would a tcpdump
> capture of the error be useful (considering all the RPC traffic is
> Kerberos-encrypted)?
If you could do a run that causes these failures safely without on the wire
encryption, you could switch the mount to "krb5i". Then a tcpdump done
via something like:
# tcpdump -s 0 -w out.pcap host <other-system>
followed by pulling out.pcap into wireshark, you could maybe see where the
failure is occurring. (Unlike tcpdump, wireshark decodes NFS traffic
quite nicely.)

rick