From nobody Mon Jan 15 15:13:49 2024 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TDFz64ltvz5788X for ; Mon, 15 Jan 2024 15:13:54 +0000 (UTC) (envelope-from pblok@bsd4all.org) Received: from mail.bsd4all.org (mail.bsd4all.org [88.99.169.216]) by mx1.freebsd.org (Postfix) with ESMTP id 4TDFz54fBfz4pnd for ; Mon, 15 Jan 2024 15:13:53 +0000 (UTC) (envelope-from pblok@bsd4all.org) Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of pblok@bsd4all.org designates 88.99.169.216 as permitted sender) smtp.mailfrom=pblok@bsd4all.org Received: from mail.bsd4all.org (localhost [127.0.0.1]) by mail.bsd4all.org (Postfix) with ESMTP id 2D36750C8; Mon, 15 Jan 2024 16:13:56 +0100 (CET) X-Virus-Scanned: amavisd-new at bsd4all.org Received: from mail.bsd4all.org ([127.0.0.1]) by mail.bsd4all.org (mail.bsd4all.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bsyECJlwDtjo; Mon, 15 Jan 2024 16:13:55 +0100 (CET) Received: from smtpclient.apple (pony_ip [204.168.249.121]) by mail.bsd4all.org (Postfix) with ESMTPSA id 008EA5063; Mon, 15 Jan 2024 16:13:53 +0100 (CET) From: Peter Blok Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_1A0F591D-D063-48DB-B399-46B93F76615D" List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.4\)) Subject: Re: NFSv4 crash of CURRENT Date: Mon, 15 Jan 2024 16:13:49 +0100 In-Reply-To: Cc: FreeBSD User , Cy Schubert , Ronald Klop , FreeBSD CURRENT To: Rick Macklem References: <20240113193324.3fd54295@thor.intern.walstatt.dynvpn.de> <1369645989.13766.1705178331205@localhost> <20240115043412.B6998C8@slippy.cwsent.com> <20240115064704.611fe0c4@thor.intern.walstatt.dynvpn.de> <683EF50F-6665-4664-A7CE-1EFE50076FB0@bsd4all.org> X-Mailer: Apple Mail (2.3696.120.41.1.4) X-Spamd-Bar: - X-Spamd-Result: default: False [-1.20 / 15.00]; SUSPICIOUS_RECIPS(1.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+mx]; RCVD_NO_TLS_LAST(0.10)[]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; FROM_HAS_DN(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:24940, ipnet:88.99.0.0/16, country:DE]; TAGGED_RCPT(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; FREEMAIL_TO(0.00)[gmail.com]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; DMARC_NA(0.00)[bsd4all.org]; R_DKIM_NA(0.00)[]; RCPT_COUNT_FIVE(0.00)[5] X-Rspamd-Queue-Id: 4TDFz54fBfz4pnd --Apple-Mail=_1A0F591D-D063-48DB-B399-46B93F76615D Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 I can give it a shot on one of my clients. > On 15 Jan 2024, at 16:04, Rick Macklem wrote: >=20 > On Mon, Jan 15, 2024 at 2:53=E2=80=AFAM Peter Blok > wrote: >>=20 >> Hi, >>=20 >> Forgot to mention I=E2=80=99m on 13-stable. The fix that is causing = the crash with automounted NFS is: >>=20 >> commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b >> Author: Konstantin Belousov >> Date: Tue Jan 2 00:22:44 2024 +0200 >>=20 >> nfsclient: limit situations when we do unlocked read-ahead by = nfsiod >>=20 >> (cherry picked from commit = 70dc6b2ce314a0f32755005ad02802fca7ed186e) >>=20 >> When I remove the fix, the problem is gone. Add it back and the crash = happens. > Kostik has already come up with a probable fix. If you want it right > away, here it is, > but he'll probably commit it soon anyhow: > diff --git a/sys/fs/nfsclient/nfs_clbio.c = b/sys/fs/nfsclient/nfs_clbio.c > index c027d7d7c3fd..1cf45bb0c924 100644 > --- a/sys/fs/nfsclient/nfs_clbio.c > +++ b/sys/fs/nfsclient/nfs_clbio.c > @@ -414,6 +414,18 @@ nfs_bioread_check_cons(struct vnode *vp, struct > thread *td, struct ucred *cred) > return (error); > } >=20 > +static bool > +ncl_bioread_dora(struct vnode *vp) > +{ > + vm_object_t obj; > + > + obj =3D vp->v_object; > + if (obj =3D=3D NULL) > + return (true); > + return (!vm_object_mightbedirty(vp->v_object) && > + vp->v_object->un_pager.vnp.writemappings =3D=3D 0); > +} > + > /* > * Vnode op for read using bio > */ > @@ -486,9 +498,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int > ioflag, struct ucred *cred) > * unlocked read by nfsiod could obliterate changes > * done by userspace. > */ > - if (nmp->nm_readahead > 0 && > - !vm_object_mightbedirty(vp->v_object) && > - vp->v_object->un_pager.vnp.writemappings =3D=3D 0) = { > + if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp)) { > for (nra =3D 0; nra < nmp->nm_readahead && nra < = seqcount && > (off_t)(lbn + 1 + nra) * biosize < nsize; = nra++) { > rabn =3D lbn + 1 + nra; > @@ -675,9 +685,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int > ioflag, struct ucred *cred) > * directory offset cookie of the next block.) > */ > NFSLOCKNODE(np); > - if (nmp->nm_readahead > 0 && > - !vm_object_mightbedirty(vp->v_object) && > - vp->v_object->un_pager.vnp.writemappings =3D=3D 0 = && > + if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp) && > (bp->b_flags & B_INVAL) =3D=3D 0 && > (np->n_direofoffset =3D=3D 0 || > (lbn + 1) * NFS_DIRBLKSIZ < np->n_direofoffset) && >=20 > rick > ps: It appears that autofs causes the directory to be read before it > is open'd for > some reason. I've never looked at autofs. >=20 >>=20 >> Peter >>=20 >> On 15 Jan 2024, at 09:31, Peter Blok wrote: >>=20 >> Hi, >>=20 >> I do have a crash on a NFS client with stable of today = (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related. = Maybe it is the same problem. >>=20 >> I have ports automounted on /am/ports. When I do cd /am/ports/sys and = type tab to autocomplete it crashes with the below stack trace. If I = plainly mount ports on /usr/ports and do the same everything works. I am = using NFSv3 >>=20 >> Peter >>=20 >>=20 >>=20 >>=20 >> Fatal trap 12: page fault while in kernel mode >> cpuid =3D 2; apic id =3D 04 >> fault virtual address =3D 0x89 >> fault code =3D supervisor read data, page not present >> instruction pointer =3D 0x20:0xffffffff809645d4 >> stack pointer =3D 0x28:0xfffffe00acadb830 >> frame pointer =3D 0x28:0xfffffe00acadb830 >> code segment =3D base 0x0, limit 0xfffff, type 0x1b >> =3D DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags =3D interrupt enabled, resume, IOPL =3D 0 >> current process =3D 6869 (csh) >> trap number =3D 12 >> panic: page fault >> cpuid =3D 2 >> time =3D 1705306940 >> KDB: stack backtrace: >> #0 0xffffffff806232f5 at kdb_backtrace+0x65 >> #1 0xffffffff805d7a02 at vpanic+0x152 >> #2 0xffffffff805d78a3 at panic+0x43 >> #3 0xffffffff809d58ad at trap_fatal+0x38d >> #4 0xffffffff809d58ff at trap_pfault+0x4f >> #5 0xffffffff809af048 at calltrap+0x8 >> #6 0xffffffff804c7a7e at ncl_bioread+0xb7e >> #7 0xffffffff804b9d90 at nfs_readdir+0x1f0 >> #8 0xffffffff8069c61a at vop_sigdefer+0x2a >> #9 0xffffffff809f8ae0 at VOP_READDIR_APV+0x20 >> #10 0xffffffff81ce75de at autofs_readdir+0x2ce >> #11 0xffffffff809f8ae0 at VOP_READDIR_APV+0x20 >> #12 0xffffffff806c3002 at kern_getdirentries+0x222 >> #13 0xffffffff806c33a9 at sys_getdirentries+0x29 >> #14 0xffffffff809d6180 at amd64_syscall+0x110 >> #15 0xffffffff809af95b at fast_syscall_common+0xf8 >>=20 >>=20 >>=20 >> On 15 Jan 2024, at 06:46, FreeBSD User = wrote: >>=20 >> Am Sun, 14 Jan 2024 20:34:12 -0800 >> Cy Schubert schrieb: >>=20 >> In message = > om> >> , Rick Macklem writes: >>=20 >> On Sat, Jan 13, 2024 at 12:39=3DE2=3D80=3DAFPM Ronald Klop = =3D >> wrote: >>=20 >>=20 >>=20 >> Van: FreeBSD User >> Datum: 13 januari 2024 19:34 >> Aan: FreeBSD CURRENT >> Onderwerp: NFSv4 crash of CURRENT >>=20 >> Hello, >>=20 >> running CURRENT client (FreeBSD 15.0-CURRENT #4 = main-n267556-69748e62e82a=3D >>=20 >> : Sat Jan 13 18:08:32 >>=20 >> CET 2024 amd64). One NFSv4 server is same OS revision as the = mentioned cl=3D >>=20 >> ient, other is FreeBSD >>=20 >> 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized. >>=20 >> I can crash the client reproducable by accessing the one or other = NFSv4 F=3D >>=20 >> S (a simple ls -la). >>=20 >> The NFSv4 FS is backed by ZFS (if this matters). I do not have = physicla a=3D >>=20 >> ccess to the client >>=20 >> host, luckily the box recovers. >>=20 >> Did you rebuild both the nfscommon and nfscl modules from the same = sources? >> I did a commit to main that changes the interface between these two >> modules and did bump the >> __FreeBSD_version to 1500010, which should cause both to be rebuilt. >> (If you have "options NFSCL" in your kernel config, both should have >> been rebuilt as a part of >> the kernel build.) >>=20 >>=20 >> Is anyone by chance seeing autofs in the backtrace too? >>=20 >>=20 >>=20 >> Hello Cy Shubert, >>=20 >> I forgot to mention that those crashes occur with autofs mounted = filesystems. Good question, >> by the way, I will check whether crashes also happen when mounting = the tradidional way. >>=20 >> Kind regards, >>=20 >> oh >>=20 >> -- >> O. Hartmann --Apple-Mail=_1A0F591D-D063-48DB-B399-46B93F76615D Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 I = can give it a shot on one of my clients.

On 15 = Jan 2024, at 16:04, Rick Macklem <rick.macklem@gmail.com> wrote:

On Mon, Jan 15, 2024 at 2:53=E2=80=AFAM Peter Blok = <pblok@bsd4all.org> wrote:

Hi,

Forgot to mention I=E2=80=99m = on 13-stable. The fix that is causing the crash with automounted NFS = is:

commit = cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b
Author: = Konstantin Belousov <kib@FreeBSD.org>
Date:   Tue = Jan 2 00:22:44 2024 +0200

   nfsclient: limit situations when we do = unlocked read-ahead by nfsiod

   (cherry picked from commit = 70dc6b2ce314a0f32755005ad02802fca7ed186e)

When I remove the fix, the problem is gone. Add it back and = the crash happens.
Kostik has already come up with a probable fix. If you want = it right
away, here it = is,
but he'll = probably commit it soon anyhow:
diff --git a/sys/fs/nfsclient/nfs_clbio.c = b/sys/fs/nfsclient/nfs_clbio.c
index c027d7d7c3fd..1cf45bb0c924 100644
--- = a/sys/fs/nfsclient/nfs_clbio.c
+++ b/sys/fs/nfsclient/nfs_clbio.c
@@ -414,6 +414,18 @@ = nfs_bioread_check_cons(struct vnode *vp, struct
thread *td, struct ucred = *cred)
       return = (error);
}

+static bool
+ncl_bioread_dora(struct vnode = *vp)
+{
+ =       vm_object_t obj;
+
+       obj =3D = vp->v_object;
+       if (obj =3D=3D = NULL)
+ =             &n= bsp; return (true);
+       return = (!vm_object_mightbedirty(vp->v_object) &&
+ =           vp->v_objec= t->un_pager.vnp.writemappings =3D=3D 0);
+}
+
/*
 * Vnode op for read using = bio
 */
@@ -486,9 +498,7 @@ ncl_bioread(struct vnode *vp, struct uio = *uio, int
ioflag, = struct ucred *cred)
          &nb= sp;     * unlocked read by nfsiod could = obliterate changes
          &nb= sp;     * done by userspace.
          &nb= sp;     */
- =             &n= bsp; if (nmp->nm_readahead > 0 &&
- =             &n= bsp;     !vm_object_mightbedirty(vp->v_object)= &&
- =             &n= bsp;     vp->v_object->un_pager.vnp.writema= ppings =3D=3D 0) {
+ =             &n= bsp; if (nmp->nm_readahead > 0 && = ncl_bioread_dora(vp)) {
          &nb= sp;        for (nra =3D 0; nra = < nmp->nm_readahead && nra < seqcount = &&
          &nb= sp;            = ;(off_t)(lbn + 1 + nra) * biosize < nsize; nra++) {
          &nb= sp;            = ;rabn =3D lbn + 1 + nra;
@@ -675,9 +685,7 @@ ncl_bioread(struct vnode *vp, struct uio = *uio, int
ioflag, = struct ucred *cred)
          &nb= sp;     *  directory offset cookie of the = next block.)
          &nb= sp;     */
          &nb= sp;    NFSLOCKNODE(np);
- =             &n= bsp; if (nmp->nm_readahead > 0 &&
- =             &n= bsp;     !vm_object_mightbedirty(vp->v_object)= &&
- =             &n= bsp;     vp->v_object->un_pager.vnp.writema= ppings =3D=3D 0 &&
+ =             &n= bsp; if (nmp->nm_readahead > 0 && = ncl_bioread_dora(vp) &&
          &nb= sp;        (bp->b_flags & = B_INVAL) =3D=3D 0 &&
          &nb= sp;        (np->n_direofoffset = =3D=3D 0 ||
          &nb= sp;        (lbn + 1) * = NFS_DIRBLKSIZ < np->n_direofoffset) &&

rick
ps: It appears that autofs = causes the directory to be read before it
is open'd for
     some reason. I've never looked = at autofs.


Peter

On 15 Jan 2024, at 09:31, = Peter Blok <pblok@bsd4all.org> wrote:

Hi,

I do have a crash on a NFS = client with stable of today (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). = It is also autofs related. Maybe it is the same problem.
I have ports automounted on /am/ports. When I do cd = /am/ports/sys and type tab to autocomplete it crashes with the below = stack trace. If I plainly mount ports on /usr/ports and do the same = everything works. I am using NFSv3

Peter




Fatal trap 12: page fault while in kernel mode
cpuid =3D 2; apic id =3D 04
fault virtual = address =3D 0x89
fault code =3D supervisor read data, page = not present
instruction pointer =3D = 0x20:0xffffffff809645d4
stack pointer =        =3D 0x28:0xfffffe00acadb830
frame pointer        =3D = 0x28:0xfffffe00acadb830
code segment =3D base 0x0, limit = 0xfffff, type 0x1b
=3D DPL 0, pres 1, long 1, def32 0, = gran 1
processor eflags =3D interrupt enabled, resume, = IOPL =3D 0
current process =3D 6869 (csh)
trap= number =3D 12
panic: page fault
cpuid =3D = 2
time =3D 1705306940
KDB: stack = backtrace:
#0 0xffffffff806232f5 at kdb_backtrace+0x65
#1 0xffffffff805d7a02 at vpanic+0x152
#2 = 0xffffffff805d78a3 at panic+0x43
#3 0xffffffff809d58ad at = trap_fatal+0x38d
#4 0xffffffff809d58ff at = trap_pfault+0x4f
#5 0xffffffff809af048 at calltrap+0x8
#6 0xffffffff804c7a7e at ncl_bioread+0xb7e
#7 = 0xffffffff804b9d90 at nfs_readdir+0x1f0
#8 = 0xffffffff8069c61a at vop_sigdefer+0x2a
#9 = 0xffffffff809f8ae0 at VOP_READDIR_APV+0x20
#10 = 0xffffffff81ce75de at autofs_readdir+0x2ce
#11 = 0xffffffff809f8ae0 at VOP_READDIR_APV+0x20
#12 = 0xffffffff806c3002 at kern_getdirentries+0x222
#13 = 0xffffffff806c33a9 at sys_getdirentries+0x29
#14 = 0xffffffff809d6180 at amd64_syscall+0x110
#15 = 0xffffffff809af95b at fast_syscall_common+0xf8



On 15 Jan 2024, at 06:46, = FreeBSD User <freebsd@walstatt-de.de> wrote:

Am Sun, 14 Jan 2024 20:34:12 -0800
Cy Schubert = <Cy.Schubert@cschubert.com> schrieb:

In message <CAM5tNy5aat8vUn2fsX9jV=3DD9yGZdnO20Q0Ea7qtszx+zSES2bw@mail.gmai= l.c
om>
, Rick Macklem writes:

On Sat, Jan 13, 2024 at 12:39=3DE2=3D80=3DAFPM = Ronald Klop <ronald-lists@klop.ws>=3D
wrote:



Van: FreeBSD = User <freebsd@walstatt-de.de>
Datum: 13 = januari 2024 19:34
Aan: FreeBSD CURRENT <freebsd-current@freebsd.org>
Onderwerp: = NFSv4 crash of CURRENT

Hello,

running CURRENT client (FreeBSD 15.0-CURRENT = #4 main-n267556-69748e62e82a=3D

: Sat Jan = 13 18:08:32

CET 2024 amd64). One NFSv4 = server is same OS revision as the mentioned cl=3D

ient, other is FreeBSD

13.2-RELEASE-p8. Both offer NFSv4 filesystems, = non-kerberized.

I can crash the client = reproducable by accessing the one or other NFSv4 F=3D

S (a simple ls -la).

The NFSv4 = FS is backed by ZFS (if this matters). I do not have physicla a=3D

ccess to the client

host, luckily the box recovers.

Did you rebuild both the nfscommon and nfscl modules from the = same sources?
I did a commit to main that changes the = interface between these two
modules and did bump the
__FreeBSD_version to 1500010, which should cause both to be = rebuilt.
(If you have "options NFSCL" in your kernel = config, both should have
been rebuilt as a part of
the kernel build.)


Is anyone by chance seeing autofs in the backtrace too?



Hello Cy = Shubert,

I forgot to mention that those = crashes occur with autofs mounted filesystems. Good question,
by the way, I will check whether crashes also happen when = mounting the tradidional way.

Kind = regards,

oh

--
O. = Hartmann

= --Apple-Mail=_1A0F591D-D063-48DB-B399-46B93F76615D--