From nobody Fri Jan 19 15:42:53 2024 X-Original-To: stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TGkQy5lqfz56lRk for ; Fri, 19 Jan 2024 15:43:06 +0000 (UTC) (envelope-from uspoerlein@gmail.com) Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TGkQx5zJNz5780; Fri, 19 Jan 2024 15:43:05 +0000 (UTC) (envelope-from uspoerlein@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=freebsd.org (policy=none); spf=pass (mx1.freebsd.org: domain of uspoerlein@gmail.com designates 209.85.215.172 as permitted sender) smtp.mailfrom=uspoerlein@gmail.com Received: by mail-pg1-f172.google.com with SMTP id 41be03b00d2f7-53fa455cd94so554337a12.2; Fri, 19 Jan 2024 07:43:05 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705678984; x=1706283784; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=L2jbBOdzxbmWS4IExebytCIbtZrYlm4ertQ0dMOCK8c=; b=Bl8mdo3ZD52/T394vREI85/srsTBBlEd3u1bYVFSj+MPbfr3UdiYglUL0VU8KiGHgh Jj4cvDFp/+1lyOc5YS6MENOH64KJ2NxXDBJ8FoVL7YCxFH/iRc3YxIg24z/onpYOEXtp zDjHBlJhEKU8XNcFW3NOUlPsC7CBVnegXtVvlxcMIaoU+JTyupwJ8hFdF+IxEI1PiUEx ugWXtOedRwiVe60s2jf6yDt2FFWT5pS7ZosGfYu5F4azVMSmaNMAvLifwXvCqbfz4dnu 7tHszQ3btYWEnTGkZlG69Wd7TlgjSXUnYiJ+X6jxGqIpKtR6jbTopHepraplcg1f/Exh 7tug== X-Gm-Message-State: AOJu0Yw1h7A5ZtJqQ7vWmiD+Y/6Hcm05zi0QeNO0/AE9gOEIyoTqBpkr 7RmPyaoorNNVf4qbpewvT+pg1oVs6wEXXV3ffGSS+rgnDlJ3A+3XNK7IKINFsbjKfmW08UWD5sG KXt7vgww+O4g3l43whNFTpMfhfQDf+pYz X-Google-Smtp-Source: AGHT+IFhHdqhOBafhQ9mJqBl03rPmehhBl/wIenbqkT8Q6SR46yYiYz/40xl8xNxLs1r4RKILt8KAwR525aHqym49C4= X-Received: by 2002:a17:90b:2d8d:b0:28e:82c0:db91 with SMTP id sj13-20020a17090b2d8d00b0028e82c0db91mr2245956pjb.43.1705678983891; Fri, 19 Jan 2024 07:43:03 -0800 (PST) List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?Q?Ulrich_Sp=C3=B6rlein?= Date: Fri, 19 Jan 2024 16:42:53 +0100 Message-ID: Subject: Re: Repeatable nfs_readdir kernel panic after upgrade to stable/14 To: Konstantin Belousov Cc: stable@freebsd.org, Rick Macklem Content-Type: multipart/alternative; boundary="000000000000ef9026060f4e55ed" X-Spamd-Bar: -- X-Spamd-Result: default: False [-2.02 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-0.997]; FORGED_SENDER(0.30)[uqs@freebsd.org,uspoerlein@gmail.com]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17:c]; NEURAL_HAM_SHORT(-0.12)[-0.123]; DMARC_POLICY_SOFTFAIL(0.10)[freebsd.org : SPF not aligned (relaxed), No valid DKIM,none]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; RCVD_COUNT_ONE(0.00)[1]; ARC_NA(0.00)[]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; RCVD_IN_DNSWL_NONE(0.00)[209.85.215.172:from]; TO_DN_SOME(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; FREEMAIL_ENVFROM(0.00)[gmail.com]; RWL_MAILSPIKE_POSSIBLE(0.00)[209.85.215.172:from]; TAGGED_RCPT(0.00)[]; MISSING_XM_UA(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; FREEMAIL_CC(0.00)[freebsd.org,gmail.com]; FROM_NEQ_ENVFROM(0.00)[uqs@freebsd.org,uspoerlein@gmail.com]; MLMMJ_DEST(0.00)[stable@freebsd.org]; RCVD_TLS_LAST(0.00)[]; R_DKIM_NA(0.00)[]; RCPT_COUNT_THREE(0.00)[3] X-Rspamd-Queue-Id: 4TGkQx5zJNz5780 --000000000000ef9026060f4e55ed Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Indeed, seems to work now. Thanks for the speedy fix. On Wed, 17 Jan 2024, 11:07 Konstantin Belousov, wrote: > On Wed, Jan 17, 2024 at 10:28:01AM +0100, Ulrich Sp=C3=B6rlein wrote: > > Hey there, > > upgraded my NFS server and laptop (NFS client) to stable/14 over the > > weekend and now anything "intensive" that reads from NFS seems to kerne= l > > panic. > > > > I think this started when I upgraded the server first, shrugged it off = as > > some overload on the laptop, finished the laptop upgrade to 14 and now > > everytime I open easytag on the NFS automounted directory, or browsing > > photos with geeqie it locks up hard. > > > > Mounts on the client currently look like so: > > > > map /etc/auto_tank on /tank (autofs) > > map -media on /media (autofs) > > 192.168.0.151:/tank/music on /tank/music (nfs, automounted) > > > > I'm not even sure if I'm using NFS3 or 4 or whether I'm using the ZFS > based > > one, I've set this up ages ago. > > > > Fatal trap 12: page fault while in kernel mode > > cpuid =3D 1; apic id =3D 02 > > fault virtual address =3D 0x89 > > fault code =3D supervisor read data, page not present > > instruction pointer =3D 0x20:0xffffffff80eee094 > > stack pointer =3D 0x28:0xfffffe01268c0830 > > frame pointer =3D 0x28:0xfffffe01268c0830 > > code segment =3D base 0x0, limit 0xfffff, type 0x1b > > =3D DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags =3D interrupt enabled, resume, IOPL =3D 0 > > current process =3D 74673 (easytag) > > rdi: 0000000000000000 rsi: ffffffff819bff08 rdx: 0000000000000000 > > rcx: 0000000000000000 r8: fffffe003781e0f0 r9: fffff8001ab51740 > > rax: 0000000000000000 rbx: fffff8001ab51740 rbp: fffffe01268c0830 > > r10: ffffffff00000000 r11: fffffe01268c07b0 r12: fffffe003781e0f0 > > r13: fffff8047ac47700 r14: fffffe012ac1ba38 r15: fffff80437cac000 > > trap number =3D 12 > > panic: page fault > > cpuid =3D 1 > > time =3D 1705480771 > > KDB: stack backtrace: > > #0 0xffffffff80b9d68d at kdb_backtrace+0x5d > > #1 0xffffffff80b4f95f at vpanic+0x12f > > #2 0xffffffff80b4f823 at panic+0x43 > > #3 0xffffffff8102902f at trap_fatal+0x40f > > #4 0xffffffff8102907f at trap_pfault+0x4f > > #5 0xffffffff80ffef48 at calltrap+0x8 > > #6 0xffffffff80a3a3fe at ncl_bioread+0xb7e > > #7 0xffffffff80a2c0a0 at nfs_readdir+0x1f0 > > #8 0xffffffff80c217aa at vop_sigdefer+0x2a > > #9 0xffffffff81100280 at VOP_READDIR_APV+0x20 > > #10 0xffffffff846af5ae at autofs_readdir+0x2ce > > #11 0xffffffff81100280 at VOP_READDIR_APV+0x20 > > #12 0xffffffff80c48501 at kern_getdirentries+0x221 > > #13 0xffffffff80c488a9 at sys_getdirentries+0x29 > > #14 0xffffffff810298d9 at amd64_syscall+0x109 > > #15 0xffffffff80fff85b at fast_syscall_common+0xf8 > > Uptime: 3m18s > > Dumping 1242 out of 32368 > > MB:..2%..11%..21%..31%..42%..51%..61%..71%..82%..91% > > > > I can still access those NFS mounts just fine, can play music off them > with > > audacious or just mpv, but easytag will try to recursively read > everything > > and presumably puts a lot of stress on the system. > > > > I see there was chatter about this recently, and kib committed somethin= g > to > > nfsclient, which got merged to stable/14 on the 11th, but my build is > from > > the 14th, so presumably I already have this "fix", and it's not working= ? > > > > I'm on n266311-299e9fe9709a right now, which _is_ after kib's fixes, > maybe > > they are not sufficient for stable/14? > > You need 7b49e60227f8 which I just pushed. > > --000000000000ef9026060f4e55ed Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Indeed, seems to work now. Thanks for the speedy fix.
On W= ed, 17 Jan 2024, 11:07 Konstantin Belousov, <kib@freebsd.org> wrote:
On Wed, Jan 17, 2024 at 10:28:01AM +0100, Ulrich Sp=C3=B6rlein wrote: > Hey there,
> upgraded my NFS server and laptop (NFS client) to stable/14 over the > weekend and now anything "intensive" that reads from NFS see= ms to kernel
> panic.
>
> I think this started when I upgraded the server first, shrugged it off= as
> some overload on the laptop, finished the laptop upgrade to 14 and now=
> everytime I open easytag on the NFS automounted directory, or browsing=
> photos with geeqie it locks up hard.
>
> Mounts on the client currently look like so:
>
> map /etc/auto_tank on /tank (autofs)
> map -media on /media (autofs)
> 192.168.0.151:/tank/music on /tank/music (nfs, automounted)
>
> I'm not even sure if I'm using NFS3 or 4 or whether I'm us= ing the ZFS based
> one, I've set this up ages ago.
>
> Fatal trap 12: page fault while in kernel mode
> cpuid =3D 1; apic id =3D 02
> fault virtual address=C2=A0 =C2=A0=3D 0x89
> fault code=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =3D supervi= sor read data, page not present
> instruction pointer=C2=A0 =C2=A0 =C2=A0=3D 0x20:0xffffffff80eee094
> stack pointer=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=3D 0x28:0xfffff= e01268c0830
> frame pointer=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=3D 0x28:0xfffff= e01268c0830
> code segment=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =3D base 0x0, li= mit 0xfffff, type 0x1b
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0=3D DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags=C2=A0 =C2=A0 =C2=A0 =C2=A0 =3D interrupt enabled, res= ume, IOPL =3D 0
> current process=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=3D 74673 (easytag) > rdi: 0000000000000000 rsi: ffffffff819bff08 rdx: 0000000000000000
> rcx: 0000000000000000=C2=A0 r8: fffffe003781e0f0=C2=A0 r9: fffff8001ab= 51740
> rax: 0000000000000000 rbx: fffff8001ab51740 rbp: fffffe01268c0830
> r10: ffffffff00000000 r11: fffffe01268c07b0 r12: fffffe003781e0f0
> r13: fffff8047ac47700 r14: fffffe012ac1ba38 r15: fffff80437cac000
> trap number=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=3D 12
> panic: page fault
> cpuid =3D 1
> time =3D 1705480771
> KDB: stack backtrace:
> #0 0xffffffff80b9d68d at kdb_backtrace+0x5d
> #1 0xffffffff80b4f95f at vpanic+0x12f
> #2 0xffffffff80b4f823 at panic+0x43
> #3 0xffffffff8102902f at trap_fatal+0x40f
> #4 0xffffffff8102907f at trap_pfault+0x4f
> #5 0xffffffff80ffef48 at calltrap+0x8
> #6 0xffffffff80a3a3fe at ncl_bioread+0xb7e
> #7 0xffffffff80a2c0a0 at nfs_readdir+0x1f0
> #8 0xffffffff80c217aa at vop_sigdefer+0x2a
> #9 0xffffffff81100280 at VOP_READDIR_APV+0x20
> #10 0xffffffff846af5ae at autofs_readdir+0x2ce
> #11 0xffffffff81100280 at VOP_READDIR_APV+0x20
> #12 0xffffffff80c48501 at kern_getdirentries+0x221
> #13 0xffffffff80c488a9 at sys_getdirentries+0x29
> #14 0xffffffff810298d9 at amd64_syscall+0x109
> #15 0xffffffff80fff85b at fast_syscall_common+0xf8
> Uptime: 3m18s
> Dumping 1242 out of 32368
> MB:..2%..11%..21%..31%..42%..51%..61%..71%..82%..91%
>
> I can still access those NFS mounts just fine, can play music off them= with
> audacious or just mpv, but easytag will try to recursively read everyt= hing
> and presumably puts a lot of stress on the system.
>
> I see there was chatter about this recently, and kib committed somethi= ng to
> nfsclient, which got merged to stable/14 on the 11th, but my build is = from
> the 14th, so presumably I already have this "fix", and it= 9;s not working?
>
> I'm on n266311-299e9fe9709a right now, which _is_ after kib's = fixes, maybe
> they are not sufficient for stable/14?

You need 7b49e60227f8 which I just pushed.

--000000000000ef9026060f4e55ed--