From nobody Wed Jun 19 14:04:48 2024 X-Original-To: fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4W453h4Q3zz5NCpk for ; Wed, 19 Jun 2024 14:05:04 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Received: from mail-pg1-x534.google.com (mail-pg1-x534.google.com [IPv6:2607:f8b0:4864:20::534]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4W453g6m6bz4hkd for ; Wed, 19 Jun 2024 14:05:03 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20230601 header.b=U9+AkyGS; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of rick.macklem@gmail.com designates 2607:f8b0:4864:20::534 as permitted sender) smtp.mailfrom=rick.macklem@gmail.com Received: by mail-pg1-x534.google.com with SMTP id 41be03b00d2f7-652fd0bb5e6so5164704a12.0 for ; Wed, 19 Jun 2024 07:05:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718805902; x=1719410702; darn=freebsd.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=jB+ceRyDj4+rJWHiydLE7JQdFGZtTGuXucuzNNxGgjQ=; b=U9+AkyGSpOKwHBW9yT0vQyc4T+OG/QAxYPjtMNyFihjOOpn4d6Tndlv2VSQ0KCmoOQ DP9h0o308bvFYEHjOkLY8hDZRMvmDqwV3vglJFzfdcdKBjf2dd/mDXy/SLCGURur3No5 KWWQXSZdTFcBdtmdkX30N2+tCT2v7q+oKnZ7KHOk3e8CwN8fM7Srnopwh27MAzTJqwRj 3gg2xCE/5Pua3znMByklE+aP9bgFSBxUQO4OoiePVFBV6E0hB7hTFrX6ucIxYRzeqcbk X8rrbd1mAMgC6N5k5WR6vzgYtykVsQGt4vMhogcNBciYcqPivTeXBbWE60FMGuU0CfHu S08Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718805902; x=1719410702; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jB+ceRyDj4+rJWHiydLE7JQdFGZtTGuXucuzNNxGgjQ=; b=gzGUZ1BJu1iEaNxNzKDAp7w86EnHZgDSzSCXq0w18tQVVcsI5qhqG0K8y3FZDgjmwy vH7nqpXVKg4rj9y+dX1hGrL/g1I9h4IL/xZgFkUoOWUNpT1FFw21aB8aZ/ZCZ8s7ym56 gSd126QTz/Oj2j9nMBgtpcQ3pn+Gf5Yqjy6IEoU2fdcdwI6kNMp4NLWmp+J+UPyqoUye xdqUeTLco7y8W2r6ULU36GxNK8w9ej5RxXwBUfxl3Z/T5RRl5D/RgrLGIPkC+I0LP0Wi rVXCYFKbP2WzTqIigbE2Ds1bSKwGF3e4yk3mc8WjEu71/3C/9/xDzKBFQ+r0/avylJq3 F/cg== X-Gm-Message-State: AOJu0Yz6Gsv/YyQoxuYIpgxMXVQLMR0rysMqp6pW5RZm1uOZZJfelShU ffr8iQfd8DMOC/q1oJIiLQke11UJtaj/3DIlmrvKC2uE/zetstng9UPWC6nsXXm1APd9d9l1ANn otPRSECdh66LhjWdFD0ABQe2wRIGS X-Google-Smtp-Source: AGHT+IHMKA+6+Dzsco4JyWlu+XM5BcHDN9WG5mfLwiGMYa734CAKqPi1qwF/x18seY4yB9uf7QvWNzVLoxBaFSBJ2EM= X-Received: by 2002:a17:90b:4a4c:b0:2c7:dd09:7433 with SMTP id 98e67ed59e1d1-2c7dd097507mr18913a91.14.1718805902063; Wed, 19 Jun 2024 07:05:02 -0700 (PDT) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@FreeBSD.org MIME-Version: 1.0 References: In-Reply-To: From: Rick Macklem Date: Wed, 19 Jun 2024 07:04:48 -0700 Message-ID: Subject: Re: NFS, intermittent 'RPC struct is bad' errors To: fs@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: --- X-Spamd-Result: default: False [-4.00 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-0.999]; NEURAL_HAM_SHORT(-1.00)[-0.998]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20230601]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; TAGGED_FROM(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+]; FROM_HAS_DN(0.00)[]; MISSING_XM_UA(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; PREVIOUSLY_DELIVERED(0.00)[fs@freebsd.org]; TO_DN_NONE(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; MID_RHS_MATCH_FROMTLD(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; MLMMJ_DEST(0.00)[fs@freebsd.org]; RCVD_COUNT_ONE(0.00)[1]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::534:from] X-Rspamd-Queue-Id: 4W453g6m6bz4hkd On Tue, Jun 18, 2024 at 11:32=E2=80=AFPM Lexi Winter wrot= e: > > hi, > > i have a few systems running NFSv4 on FreeBSD, using Kerberos (MIT > Kerberos KDC), with the server exporting ZFS filesystems. > > recently i've noticed intermittent errors of 'RPC struct is bad' when > writing to the NFS server, which usually resolves itself after retrying. > for example: > > % rsync -iavP /scratch/Star.Trek.Prodigy.S01E* . > sending incremental file list > >f++++++++++ Star.Trek.Prodigy.S01E01E02.1080p.WEBRip.x265-KONTRAST.mkv > 32,768 0% 0.00kB/s 0:00:00 rsync: [receiver] write fail= ed on "/data/public/TV/Star Trek Prodigy/Season 01/Star.Trek.Prodigy.S01E01= E02.1080p.WEBRip.x265-KONTRAST.mkv": RPC struct is bad (72) > rsync error: error in file IO (code 11) at receiver.c(380) [receiver=3D3.= 3.0] > > rsync: [sender] write error: Broken pipe (32) The "RPC struct is bad" just refers to the RPC message that cannot be decod= ed because it is trashed for some reason. > % rsync -iavP /scratch/Star.Trek.Prodigy.S01E* . > sending incremental file list > >f.st....... Star.Trek.Prodigy.S01E01E02.1080p.WEBRip.x265-KONTRAST.mkv > 912,704,431 100% 96.51MB/s 0:00:09 (xfr#1, to-chk=3D18/19) > >f++++++++++ Star.Trek.Prodigy.S01E03.1080p.WEBRip.x265-KONTRAST.mkv > 477,408,567 100% 100.06MB/s 0:00:04 (xfr#2, to-chk=3D17/19) > [...] > > the client is running FreeBSD 15.0-CURRENT from around May 24, and the > server is running a slightly older 15.0-CURRENT from around May 23. There was an issue fixed in main/current by commits on Apr. 25. (client 8ef= ba70, server 54c3aa0). If you somehow ended up with the client having the patch a= nd the server not having the patch, this could possibly explain it? Also, the breakage (I was tricked by wireshark into believing the code was wrong. It actually turned out to be wireshark broken. On Apr. 25, I put things back to where the RFCs said they should be.) And this breakage should only occur if delegations are enabled, which will = only happen if you set "vfs.nfsd.issue_delegations=3D1" on the server (not on by default). I doubt this is what you are seeing. > > /etc/exports on the server is pretty standard: > > /data/public -sec=3Dkrb5:krb5i:krb5p -network 2001:8= b0:aab5::/48 > /data/public/Books -sec=3Dkrb5:krb5i:krb5p -network 2001:8= b0:aab5::/48 > /data/public/CalibreLibrary -sec=3Dkrb5:krb5i:krb5p -network 2001:8= b0:aab5::/48 > /data/public/Comics -sec=3Dkrb5:krb5i:krb5p -network 2001:8= b0:aab5::/48 > /data/public/Films -sec=3Dkrb5:krb5i:krb5p -network 2001:8= b0:aab5::/48 > /data/public/Miscellaneous -sec=3Dkrb5:krb5i:krb5p -network 2001:8= b0:aab5::/48 > V4: /data -sec=3Dsys:krb5:krb5i:krb5p -networ= k 2001:8b0:aab5::/48 > > client mount options: > > hemlock.eden.le-fay.org:/public /data/public nfs rw,nfsv4,minorver= sion=3D2,sec=3Dkrb5p,gssname=3Dhost,bgnow,proto=3Dtcp6,nconnect=3D4,rsize= =3D1048576,wsize=3D1048576,noncontigwr 0 0 You might try getting rid of the "noncontigwr" option, since I do not test that often, to see if it helps. > > is there anything more i can do investigate this? would a tcpdump > capture of the error be useful (considering all the RPC traffic is > Kerberos-encrypted)? The only thing that a tcpdump (pulled into wireshark after capture) might show you is TCP layer issues. Unless getting rid of "nocontigwr" fixes the problem, it sounds like some sort of corruption occurring in the network fabric. This might be caught be wireshark as TCP timeouts or ??? No one else has reported anything like this recently, rick