From nobody Mon Jan 15 16:36:24 2024 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TDHpv33PTz57JRC for ; Mon, 15 Jan 2024 16:36:55 +0000 (UTC) (envelope-from freebsd@walstatt-de.de) Received: from smtp052.goneo.de (smtp5.goneo.de [IPv6:2001:1640:5::8:30]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4TDHpt6zYKz591W for ; Mon, 15 Jan 2024 16:36:54 +0000 (UTC) (envelope-from freebsd@walstatt-de.de) Authentication-Results: mx1.freebsd.org; none Received: from hub1.goneo.de (hub1.goneo.de [IPv6:2001:1640:5::8:52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by smtp5.goneo.de (Postfix) with ESMTPS id 4BD59240CF5; Mon, 15 Jan 2024 17:36:54 +0100 (CET) Received: from hub1.goneo.de (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by hub1.goneo.de (Postfix) with ESMTPS id A2E642409B1; Mon, 15 Jan 2024 17:36:52 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=walstatt-de.de; s=DKIM001; t=1705336612; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pXMGLdga9a8WXe146UXpS30vpqGCXRMMwnvE2RNOjsI=; b=ONifsAOMMp5Q3jeiiWG/UObLFrgm4XTQbt4/TDAFNJx/Vu5huVLjj1ySkV8ctiuPjspt5y wQZ3UOExDvMJBUv20EP6EiCasTBET9RU2errEY5sNJabPqnXjMT1HijCiMsMCvPMgHC3T0 GqgWAPt4+kcAzzZLz8lU1/4e0j4RorEn8Y4OuGrgd9KAgjswu9dn21wkBBuwIEnAapzLWg euY1BPCqJX3kArSbTIXIXq5SQcxtNOgT5LIFwYsbnqaoNpzw/bHnlQkIfNjxVCt6nvqQos 8Bz8PBPNWZ2XGJ5ujUmybBTbUZwY9r9y53K1PFdspmqwsTy/8aHdzDB1YWJFPw== Received: from thor.intern.walstatt.dynvpn.de (dynamic-089-012-064-156.89.12.pool.telefonica.de [89.12.64.156]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (prime256v1) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by hub1.goneo.de (Postfix) with ESMTPSA id 4702224087D; Mon, 15 Jan 2024 17:36:52 +0100 (CET) Date: Mon, 15 Jan 2024 17:36:24 +0100 From: FreeBSD User To: Peter Blok Cc: Rick Macklem , Cy Schubert , Ronald Klop , FreeBSD CURRENT Subject: Re: NFSv4 crash of CURRENT Message-ID: <20240115173651.5b1572c0@thor.intern.walstatt.dynvpn.de> In-Reply-To: References: <20240113193324.3fd54295@thor.intern.walstatt.dynvpn.de> <1369645989.13766.1705178331205@localhost> <20240115043412.B6998C8@slippy.cwsent.com> <20240115064704.611fe0c4@thor.intern.walstatt.dynvpn.de> <683EF50F-6665-4664-A7CE-1EFE50076FB0@bsd4all.org> Organization: walstatt-de.de List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-UID: 43891d X-Rspamd-UID: df2d32 X-Rspamd-Queue-Id: 4TDHpt6zYKz591W X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; TAGGED_RCPT(0.00)[]; ASN(0.00)[asn:25394, ipnet:2001:1640::/32, country:DE] Am Mon, 15 Jan 2024 16:59:07 +0100 Peter Blok schrieb: > Rick, >=20 > I can confirm Kostik=E2=80=99s fix works on 13-stable. >=20 > Peter Me, too. The patch fixed the reported problem. Thank you very much. oh >=20 > > On 15 Jan 2024, at 16:13, Peter Blok wrote: > >=20 > > I can give it a shot on one of my clients. > > =20 > >> On 15 Jan 2024, at 16:04, Rick Macklem >> > wrote: > >>=20 > >> On Mon, Jan 15, 2024 at 2:53=E2=80=AFAM Peter Blok > > >> wrote: =20 > >>>=20 > >>> Hi, > >>>=20 > >>> Forgot to mention I=E2=80=99m on 13-stable. The fix that is causing t= he crash with automounted > >>> NFS is: > >>>=20 > >>> commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b > >>> Author: Konstantin Belousov > > >>> Date: Tue Jan 2 00:22:44 2024 +0200 > >>>=20 > >>> nfsclient: limit situations when we do unlocked read-ahead by nfsi= od > >>>=20 > >>> (cherry picked from commit 70dc6b2ce314a0f32755005ad02802fca7ed186= e) > >>>=20 > >>> When I remove the fix, the problem is gone. Add it back and the crash= happens. =20 > >> Kostik has already come up with a probable fix. If you want it right > >> away, here it is, > >> but he'll probably commit it soon anyhow: > >> diff --git a/sys/fs/nfsclient/nfs_clbio.c b/sys/fs/nfsclient/nfs_clbio= .c > >> index c027d7d7c3fd..1cf45bb0c924 100644 > >> --- a/sys/fs/nfsclient/nfs_clbio.c > >> +++ b/sys/fs/nfsclient/nfs_clbio.c > >> @@ -414,6 +414,18 @@ nfs_bioread_check_cons(struct vnode *vp, struct > >> thread *td, struct ucred *cred) > >> return (error); > >> } > >>=20 > >> +static bool > >> +ncl_bioread_dora(struct vnode *vp) > >> +{ > >> + vm_object_t obj; > >> + > >> + obj =3D vp->v_object; > >> + if (obj =3D=3D NULL) > >> + return (true); > >> + return (!vm_object_mightbedirty(vp->v_object) && > >> + vp->v_object->un_pager.vnp.writemappings =3D=3D 0); > >> +} > >> + > >> /* > >> * Vnode op for read using bio > >> */ > >> @@ -486,9 +498,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int > >> ioflag, struct ucred *cred) > >> * unlocked read by nfsiod could obliterate changes > >> * done by userspace. > >> */ > >> - if (nmp->nm_readahead > 0 && > >> - !vm_object_mightbedirty(vp->v_object) && > >> - vp->v_object->un_pager.vnp.writemappings =3D=3D 0)= { > >> + if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp)) { > >> for (nra =3D 0; nra < nmp->nm_readahead && nra < se= qcount && > >> (off_t)(lbn + 1 + nra) * biosize < nsize; nra++= ) { > >> rabn =3D lbn + 1 + nra; > >> @@ -675,9 +685,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int > >> ioflag, struct ucred *cred) > >> * directory offset cookie of the next block.) > >> */ > >> NFSLOCKNODE(np); > >> - if (nmp->nm_readahead > 0 && > >> - !vm_object_mightbedirty(vp->v_object) && > >> - vp->v_object->un_pager.vnp.writemappings =3D=3D 0 = && > >> + if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp) && > >> (bp->b_flags & B_INVAL) =3D=3D 0 && > >> (np->n_direofoffset =3D=3D 0 || > >> (lbn + 1) * NFS_DIRBLKSIZ < np->n_direofoffset) && > >>=20 > >> rick > >> ps: It appears that autofs causes the directory to be read before it > >> is open'd for > >> some reason. I've never looked at autofs. > >> =20 > >>>=20 > >>> Peter > >>>=20 > >>> On 15 Jan 2024, at 09:31, Peter Blok > > >>> wrote: > >>>=20 > >>> Hi, > >>>=20 > >>> I do have a crash on a NFS client with stable of today > >>> (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related= . Maybe it is the > >>> same problem. > >>>=20 > >>> I have ports automounted on /am/ports. When I do cd /am/ports/sys and= type tab to > >>> autocomplete it crashes with the below stack trace. If I plainly moun= t ports on > >>> /usr/ports and do the same everything works. I am using NFSv3 > >>>=20 > >>> Peter > >>>=20 > >>>=20 > >>>=20 > >>>=20 > >>> Fatal trap 12: page fault while in kernel mode > >>> cpuid =3D 2; apic id =3D 04 > >>> fault virtual address =3D 0x89 > >>> fault code =3D supervisor read data, page not present > >>> instruction pointer =3D 0x20:0xffffffff809645d4 > >>> stack pointer =3D 0x28:0xfffffe00acadb830 > >>> frame pointer =3D 0x28:0xfffffe00acadb830 > >>> code segment =3D base 0x0, limit 0xfffff, type 0x1b > >>> =3D DPL 0, pres 1, long 1, def32 0, gran 1 > >>> processor eflags =3D interrupt enabled, resume, IOPL =3D 0 > >>> current process =3D 6869 (csh) > >>> trap number =3D 12 > >>> panic: page fault > >>> cpuid =3D 2 > >>> time =3D 1705306940 > >>> KDB: stack backtrace: > >>> #0 0xffffffff806232f5 at kdb_backtrace+0x65 > >>> #1 0xffffffff805d7a02 at vpanic+0x152 > >>> #2 0xffffffff805d78a3 at panic+0x43 > >>> #3 0xffffffff809d58ad at trap_fatal+0x38d > >>> #4 0xffffffff809d58ff at trap_pfault+0x4f > >>> #5 0xffffffff809af048 at calltrap+0x8 > >>> #6 0xffffffff804c7a7e at ncl_bioread+0xb7e > >>> #7 0xffffffff804b9d90 at nfs_readdir+0x1f0 > >>> #8 0xffffffff8069c61a at vop_sigdefer+0x2a > >>> #9 0xffffffff809f8ae0 at VOP_READDIR_APV+0x20 > >>> #10 0xffffffff81ce75de at autofs_readdir+0x2ce > >>> #11 0xffffffff809f8ae0 at VOP_READDIR_APV+0x20 > >>> #12 0xffffffff806c3002 at kern_getdirentries+0x222 > >>> #13 0xffffffff806c33a9 at sys_getdirentries+0x29 > >>> #14 0xffffffff809d6180 at amd64_syscall+0x110 > >>> #15 0xffffffff809af95b at fast_syscall_common+0xf8 > >>>=20 > >>>=20 > >>>=20 > >>> On 15 Jan 2024, at 06:46, FreeBSD User >>> > wrote: > >>>=20 > >>> Am Sun, 14 Jan 2024 20:34:12 -0800 > >>> Cy Schubert > schrieb: > >>>=20 > >>> In message >>> =20 > >>> om> =20 > >>> , Rick Macklem writes: > >>>=20 > >>> On Sat, Jan 13, 2024 at 12:39=3DE2=3D80=3DAFPM Ronald Klop >>> >=3D wrote: > >>>=20 > >>>=20 > >>>=20 > >>> Van: FreeBSD User > > >>> Datum: 13 januari 2024 19:34 > >>> Aan: FreeBSD CURRENT > > >>> Onderwerp: NFSv4 crash of CURRENT > >>>=20 > >>> Hello, > >>>=20 > >>> running CURRENT client (FreeBSD 15.0-CURRENT #4 main-n267556-69748e62= e82a=3D > >>>=20 > >>> : Sat Jan 13 18:08:32 > >>>=20 > >>> CET 2024 amd64). One NFSv4 server is same OS revision as the mentione= d cl=3D > >>>=20 > >>> ient, other is FreeBSD > >>>=20 > >>> 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized. > >>>=20 > >>> I can crash the client reproducable by accessing the one or other NFS= v4 F=3D > >>>=20 > >>> S (a simple ls -la). > >>>=20 > >>> The NFSv4 FS is backed by ZFS (if this matters). I do not have physic= la a=3D > >>>=20 > >>> ccess to the client > >>>=20 > >>> host, luckily the box recovers. > >>>=20 > >>> Did you rebuild both the nfscommon and nfscl modules from the same so= urces? > >>> I did a commit to main that changes the interface between these two > >>> modules and did bump the > >>> __FreeBSD_version to 1500010, which should cause both to be rebuilt. > >>> (If you have "options NFSCL" in your kernel config, both should have > >>> been rebuilt as a part of > >>> the kernel build.) > >>>=20 > >>>=20 > >>> Is anyone by chance seeing autofs in the backtrace too? > >>>=20 > >>>=20 > >>>=20 > >>> Hello Cy Shubert, > >>>=20 > >>> I forgot to mention that those crashes occur with autofs mounted file= systems. Good > >>> question, by the way, I will check whether crashes also happen when m= ounting the > >>> tradidional way. > >>>=20 > >>> Kind regards, > >>>=20 > >>> oh > >>>=20 > >>> -- > >>> O. Hartmann =20 > > =20 >=20 --=20 O. Hartmann