From nobody Wed Sep 25 00:30:06 2024 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XCyLR0M3gz5WJDJ for ; Wed, 25 Sep 2024 00:30:23 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com [IPv6:2a00:1450:4864:20::533]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XCyLQ5bKlz4SPP for ; Wed, 25 Sep 2024 00:30:22 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ed1-x533.google.com with SMTP id 4fb4d7f45d1cf-5c24c92f699so6214743a12.2 for ; Tue, 24 Sep 2024 17:30:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727224221; x=1727829021; darn=freebsd.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2LlkhPEtLmsVCnSRJcHUfVJhZtCUmXneaUWn200+4kI=; b=PIR7cTqaAy5JpTVut2mKai28NO8Pj9UdZSDvxP2aYEwJCDDQydcmvHKf3wuXVc3l2i 2f51leXeNXloApf8TRGFWLRlaDJKXNWlJ2S/valIUUh7K9BOFdYo4mFqeSirhpgRpw2V q5tWFKY387Q4qZsEUn80uDeAMX7fm+r0+gjda430WQtgftrUr3k2cAKITswo9VH1KncC mkbWD6wJoM9SfCOUtzUOh3YMETJuh/TmRVoGeF+GDWDZIDxDZn8LsuusFAMpiarTsMys 6f005xKYpn3FWe9Qj9AGhzplzqfgZ8baf9iXQoP4t31V5tQniieaQ1a7ReXeaLi+yCjI FLMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727224221; x=1727829021; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2LlkhPEtLmsVCnSRJcHUfVJhZtCUmXneaUWn200+4kI=; b=q7x9JO/M5PwEE89Q5XKRGK7OvEnWNgQeJV1gX7khdRkFTvbWu7euX4Ms1SC8k2mwlF AzHRnIZzu/Hp6qGFArCBVNLE/12wkU8dcm+E1CCFBoiOA+bsOKrQPYaAchYTRXwo/JQ7 M+4nQyeTgtOyZDKiUzhyz4hV7ix+f374cVPpZCLPoQT+WJ99dQW8MB3d5X+M+1OwAA4e +w7VgrcDvZ2PTVa50TGbKczwBTWexNZzQGdhaCQ+6vTkSMjFSIe31wYuVN0MBlPs6Nkx uU/1vWDkZGJ5XWq9b9fFdokVNWNlcVUuPrDtFUpoDC6/wIVy6d/GenwmN24WC9isNnZx +46A== X-Forwarded-Encrypted: i=1; AJvYcCXgWtXDdaZg+efbQwf4syK6fQIy3+GzxH2+S1yhr4nJ0SaUer6njyDda05iiao0fr5Y8PWgahaoybpA@freebsd.org X-Gm-Message-State: AOJu0YzkG48VTZgt5xOAR8oFfshpBg4MFgit1Gr+nzdHvRLT96ZF+aS8 Sg0pQkRvV6Yuem9iirRrtRUP73bVkIJkGI+a55KRQIUO+vmiNKxbsDlOyV4oj41zE892flW5w87 Iz+r7luv11d9WrlEouGdqlN5KGQ== X-Google-Smtp-Source: AGHT+IE8118mpG6VhXJz+8qCPWxapsAKLs3VuD7E08tEthX+6nRW7OYxLcvxVxyb5insZYfM9VGgD0he9v+hvk/YryA= X-Received: by 2002:a05:6402:254a:b0:5c3:c520:b019 with SMTP id 4fb4d7f45d1cf-5c7207528b0mr607765a12.34.1727224221048; Tue, 24 Sep 2024 17:30:21 -0700 (PDT) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@FreeBSD.org MIME-Version: 1.0 References: In-Reply-To: From: Rick Macklem Date: Tue, 24 Sep 2024 17:30:06 -0700 Message-ID: Subject: Re: panic: nfsv4root ref cnt cpuid = 1 To: John F Carr Cc: J David , FreeBSD FS Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; TAGGED_RCPT(0.00)[]; TAGGED_FROM(0.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Queue-Id: 4XCyLQ5bKlz4SPP X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated On Tue, Sep 24, 2024 at 10:54=E2=80=AFAM John F Carr wrote: > > > > > On Sep 24, 2024, at 13:32, J David wrote: > > > > On Mon, Sep 23, 2024 at 6:38=E2=80=AFPM Rick Macklem wrote: > >> If you can easily get the source line# for nfsrpc_lookup+0x87f, that > >> could be helpful. > > > > Sure. I did it via lldb and got nfs_clrpcops.c 1697. Your method gives > > the same result. > > > > According to the github version of 14.1-RELEASE, that's the "if (ndp > > !=3D NULL) {" after the call to nfscl_openrelease(). > > > > Per lldb, the actual instruction at that address is: > > > > testq %r13, %r13 > > > > My knowledge of amd64 assembler is nearly nil, but I *think* this > > corresponds to checking if ndp is null. And I think that %r13 is a > > register, so I'm not sure that could cause a page fault. Maybe the > > trace indicates that that's the line it would have come back to if > > something in nfscl_openrelease() hadn't gone wrong? > > > > Thanks! > > > > The stack dump on the console tends to omit the frame that faulted. > The fault was probably in nfscl_openrelease or something it called. > The faulting instruction address 0xffffffff809da260 should be accurate. > The faulting data address 0x28 corresponds to the offset of field > nfsow_rwlock in struct nfsclowner. Perhaps in nfscl_openrelease > the expression op->nfo_own is NULL and the fault is in one of the > two function calls in this code block around line 850 of nfs_clstate.c. > > owp =3D op->nfso_own; > if (NFSHASONEOPENOWN(nmp)) > nfsv4_relref(&owp->nfsow_rwlock); > else > nfscl_lockunlock(&owp->nfsow_rwlock); Yes, that sounds reasonable and I think it is another symptom of the same underlying problem. It is quite possible that this problem caused the other (hang I think?) that you reported before. A few years ago, I tried to avoid the extra RPC of doing an Open after a Lookup, by combining them in the same compound RPC. It turned out this did not work for multiple open owners, so it was only enabled for "oneopenown". (Since few use this mount option, I have not seen issues with it reported by others and it worked ok for my testing.) However, I now realize that Open works because a vnode lock on the vnode being opened guarantees that the Open does not go away during the VOP_OPEN(). Unfortunately, for VOP_LOOKUP(), it is the directory that is vnode locked and a vnode lock on the file being looked up is not acquired until the Lookup reply is processed. --> To fix this, I think I need to delay the open processing until after the vnode lock on the file has been acquired or --> Just back this Open in Lookup patch out, since it can only be made to work for "oneopenown" and it only reduces each Open by one RPC. The trivial patch I posted just disables the Open in Lookup RPC optimization, so it should have the same effect as backing the patch (it is actually several git commits) out. The good news is that it looks like I finally know why you were having problems others were not reporting. (I have no idea why 14.1 is crashing when previous versions would hang, etc, but crashes are much easier to diagnose.) rick > > > John Carr >