Re: NFS, intermittent 'RPC struct is bad' errors

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Mon, 30 Sep 2024 20:07:17 UTC
On Tue, Jun 18, 2024 at 11:32 PM Lexi Winter <lexi@le-fay.org> wrote:
>
> hi,
>
> i have a few systems running NFSv4 on FreeBSD, using Kerberos (MIT
> Kerberos KDC), with the server exporting ZFS filesystems.
>
> recently i've noticed intermittent errors of 'RPC struct is bad' when
> writing to the NFS server, which usually resolves itself after retrying.
It is possible that commit 5037c6398b (dated Aug 27) fixed this,
although it is difficult to
say.  As you note, since you are using krb5p, a packet trace is pretty useless.

If you still see this after an upgrade to a post Aug.27 kernel and could somehow
run with krb5i instead of krb5p (then you could capture packets,
although you'd need
to be capturing when the failure occurs, which might result in a large
capture), then
maybe post again.

The only option you are using that doesn't often get used by others is
"noncontigwr". (You might try mounts without that option, to see if
the problem goes away.)

Oh, and if you have set vfs.nfsd.enable_delegations=1, try with that
set to 0 on the server.

That's about all I can think of for you to try, rick

> for example:
>
> % rsync -iavP /scratch/Star.Trek.Prodigy.S01E* .
> sending incremental file list
> >f++++++++++ Star.Trek.Prodigy.S01E01E02.1080p.WEBRip.x265-KONTRAST.mkv
>          32,768   0%    0.00kB/s    0:00:00  rsync: [receiver] write failed on "/data/public/TV/Star Trek Prodigy/Season 01/Star.Trek.Prodigy.S01E01E02.1080p.WEBRip.x265-KONTRAST.mkv": RPC struct is bad (72)
> rsync error: error in file IO (code 11) at receiver.c(380) [receiver=3.3.0]
>
> rsync: [sender] write error: Broken pipe (32)
> % rsync -iavP /scratch/Star.Trek.Prodigy.S01E* .
> sending incremental file list
> >f.st....... Star.Trek.Prodigy.S01E01E02.1080p.WEBRip.x265-KONTRAST.mkv
>     912,704,431 100%   96.51MB/s    0:00:09 (xfr#1, to-chk=18/19)
> >f++++++++++ Star.Trek.Prodigy.S01E03.1080p.WEBRip.x265-KONTRAST.mkv
>     477,408,567 100%  100.06MB/s    0:00:04 (xfr#2, to-chk=17/19)
> [...]
>
> the client is running FreeBSD 15.0-CURRENT from around May 24, and the
> server is running a slightly older 15.0-CURRENT from around May 23.
>
> /etc/exports on the server is pretty standard:
>
> /data/public                    -sec=krb5:krb5i:krb5p   -network 2001:8b0:aab5::/48
> /data/public/Books              -sec=krb5:krb5i:krb5p   -network 2001:8b0:aab5::/48
> /data/public/CalibreLibrary     -sec=krb5:krb5i:krb5p   -network 2001:8b0:aab5::/48
> /data/public/Comics             -sec=krb5:krb5i:krb5p   -network 2001:8b0:aab5::/48
> /data/public/Films              -sec=krb5:krb5i:krb5p   -network 2001:8b0:aab5::/48
> /data/public/Miscellaneous      -sec=krb5:krb5i:krb5p   -network 2001:8b0:aab5::/48
> V4: /data                       -sec=sys:krb5:krb5i:krb5p       -network 2001:8b0:aab5::/48
>
> client mount options:
>
> hemlock.eden.le-fay.org:/public /data/public    nfs     rw,nfsv4,minorversion=2,sec=krb5p,gssname=host,bgnow,proto=tcp6,nconnect=4,rsize=1048576,wsize=1048576,noncontigwr      0 0
>
> is there anything more i can do investigate this?  would a tcpdump
> capture of the error be useful (considering all the RPC traffic is
> Kerberos-encrypted)?