Re: NFS, intermittent 'RPC struct is bad' errors

From: J David <j.david.lists_at_gmail.com>
Date: Tue, 20 Aug 2024 15:28:45 UTC
On Wed, Jun 19, 2024 at 10:05 AM Rick Macklem <rick.macklem@gmail.com> wrote:
> On Tue, Jun 18, 2024 at 11:32 PM Lexi Winter <lexi@le-fay.org> wrote:
> > i have a few systems running NFSv4 on FreeBSD, using Kerberos (MIT
> > Kerberos KDC), with the server exporting ZFS filesystems.
> >
> > recently i've noticed intermittent errors of 'RPC struct is bad' when
> > writing to the NFS server, which usually resolves itself after retrying.
> > for example:
> [...]
> No one else has reported anything like this recently,

We are also seeing intermittent "RPC struct is bad" from FreeBSD
NFSv4.2 clients accessing ZFS filesystems.

There are a few differences between our situation and that reported by
Lexi Winter:
- NFS servers are Debian 12.
- We do not use kerberos, noncontigwr, or delegations.
- It does not resolve itself after retrying.

In our case, it seems to "infect" directories as viewed from a certain
FreeBSD client machine after attempting to modify (usually delete)
that directory. once a directory is affected, it can no longer be
viewed or removed from that client.

Other clients (FreeBSD or otherwise) are not affected. It doesn't seem
to be cached or anything. I.e., it doesn't go away if you leave that
directory alone for a while (whether a few minutes, few hours, or a
full day). Once the directory is removed from another (identical)
client, everything is fine.

It seems to happen randomly, on the order of once every few billion
NFS calls, so whatever it is, it's a very, very edge case. ZFS+NFS
seems to have a few of those, and I'm sure cross-platform mounts
aren't helping.

Thanks!