From nobody Fri Sep 27 15:33:15 2024 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XFZHc4P54z5Y7KR for ; Fri, 27 Sep 2024 15:33:32 +0000 (UTC) (envelope-from jdavidlists@gmail.com) Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XFZHb1rH6z419m for ; Fri, 27 Sep 2024 15:33:31 +0000 (UTC) (envelope-from jdavidlists@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20230601 header.b=IEgBSXsZ; spf=pass (mx1.freebsd.org: domain of jdavidlists@gmail.com designates 2607:f8b0:4864:20::42b as permitted sender) smtp.mailfrom=jdavidlists@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pf1-x42b.google.com with SMTP id d2e1a72fcca58-718d6ad6050so1875952b3a.0 for ; Fri, 27 Sep 2024 08:33:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727451209; x=1728056009; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=xFa7sodrblpLd1GZvbK6h2gI/mEjME4RKqD11TiUUzI=; b=IEgBSXsZ+6PLQ43ms5PcdJVLt0ZLZFV5JlsISSXhNNSdyCnNwmcmqIzxN0VaT7yWLE WQ2wyEZQobJCjfcW/lAxHFDIoUZOOKZq/mIOn3Qi6Jeeuw/Rui+4ezDwVjTcUjPGBc30 pR9Oof1zyUEdla35XYMDbZ9MVhemxp2ZFnW3gACNnhII1kMaIwb4JKE1amoyWNaBZ6s0 fUnv5c2v7f6lLIBhR6QDyN1/YMzjzpxPocX0ZhudUQ4pFVIxHJ07ysk84OYMXckgv8Xy Hqx72kFZ2lL6Sjsf7fz9tYryG/TVW+uYXMF0SnnM15nR5A0LhVLq0GLxO9jn+TRRgs0c zDFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727451209; x=1728056009; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=xFa7sodrblpLd1GZvbK6h2gI/mEjME4RKqD11TiUUzI=; b=RePi6QgYCgnQtbVwhEwa6UwfPZu2mFAwcFceeb/ZntNbBPR9577XoTaciNLZzQ17vY 3TFXZweRsny8Uz2inOZB3HRfixSC0M5KQ/u1DisXTR15IsFPikbKx+ONVYlLce/YWPqw XfXP67yJKCfD2JAlSE6ASSkjLqCMwiNfZpAANLWQCWUGEDyYQdbHvfBWDdm94kevMvCY nvNayepETsLSFhDBE0zJuDwCjltBqaOSyuvRaX2zE1oCL/rm/fyGEChT4KT+vNAsbfqL Ll1vLQ4FXYKwMtF7xhay6ZYTiRAHMwLZr8l+tJ6Ujs6+UFuQTm8ORMOFUgkY5wnjK4Zc RWuw== X-Gm-Message-State: AOJu0YzA8EZyUiDXQHfSeDQKFAuZD8Ctfa0Fe5m1bzmITIzVXkjcMd+G 2mdJwmxeLfIIJ+3ozfQr2E0T1Gnjq+XBv5ifSkp5qqzyY71G9+sXZlSmEStFPfmCfu9MpuS2FZJ FQFHTLCldlkE4TPtmuYzZg24I/lwCi9zCnxA= X-Google-Smtp-Source: AGHT+IHFN51C0UArvbLvFAaWSOqr98437lSuvu8/rLQoaqVFUfwvXNwL//MfYztKJGNWBKZYrZF7OtvtmZGHCCynzDw= X-Received: by 2002:a05:6a21:3a83:b0:1cf:4bcc:eb9a with SMTP id adf61e73a8af0-1d4fa66426fmr5839509637.13.1727451209571; Fri, 27 Sep 2024 08:33:29 -0700 (PDT) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@FreeBSD.org MIME-Version: 1.0 References: In-Reply-To: From: J David Date: Fri, 27 Sep 2024 11:33:15 -0400 Message-ID: Subject: Re: panic: nfsv4root ref cnt cpuid = 1 To: Rick Macklem Cc: FreeBSD FS Content-Type: text/plain; charset="UTF-8" X-Spamd-Result: default: False [-3.99 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.99)[-0.993]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20230601]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; TO_DN_ALL(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+]; TAGGED_FROM(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; ARC_NA(0.00)[]; FREEMAIL_TO(0.00)[gmail.com]; FROM_HAS_DN(0.00)[]; MISSING_XM_UA(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; MID_RHS_MATCH_FROMTLD(0.00)[]; TAGGED_RCPT(0.00)[]; MLMMJ_DEST(0.00)[freebsd-fs@freebsd.org]; RCVD_COUNT_ONE(0.00)[1]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::42b:from] X-Rspamd-Queue-Id: 4XFZHb1rH6z419m X-Spamd-Bar: --- Circling back around to whether it's better to NFS mount once and nullfs mount lots, or NFS mount lots, I've unfortunately gathered some additional data. We set up a version of our code that mounts the requisite NFS filesystem directly for each job/jail root. That worked fine in small-scale testing. In a wider deployment, however, disaster ensued. With a few thousand mounts, we started to observe two separate forms of bad behavior: - requests from established sessions would hang indefinitely leading to processes backlogging and client machines going OOM and becoming unresponsive en masse. - the NFS server appeared to be serving empty directories. The first one is self-explanatory. The second one might bear further explanation. The server runs ZFS. There are several datasets that contain job roots. E.g.: tank tank/roots tank/roots/a tank/roots/b tank/roots/c tank/roots/d The /etc/exports looks like: V4: /tank -sec=sys For client machines using nullfs, there is an /etc/fstab line like: fs:/roots /roots nfs ro,nfsv4,minorversion=2,tcp,nosuid,noatime,nolockd,noresvport,oneopenown 0 0 Under ordinary operation, NFSv4 exports the child datasets correctly. E.g.: $ ls /roots/a bin etc lib net proc sbin usr dev home libexec root tmp var Then a client does: # for a "Type A" job /sbin/mount_nullfs -o ro -o nosuid /roots/a /jobs/(job-uuid) During the failure, I observed: $ ls /roots a b c d $ ls /roots/a $ ls /roots/b $ ls /roots/c $ ls /roots/d I.e., the server appeared to have "forgotten" to descend into the child datasets and behaved as NFSv3 would have done in that situation. The server in question is FreeBSD 14.1-RELEASE-r5. There were no console diagnostics, nothing in dmesg, and negligible visible load (load average below 1.0, nfsd using ~7% of one CPU). The individual client mounts (the ones that were hanging) were a little different, because they would go straight to the subdirectory they want: # for a "Type A" job /sbin/mount_nfs -o tcp,nfsv4,minorversion=2,noatime -o ro -o nosuid -o noresvport fs:/roots/a /jobs/(job-uuid) Once all the client machines were restarted in "nullfs mode" the server returned to normal operation without further intervention, so the server behavior does appear directly related to the number of client NFS mounts. I couldn't exactly measure it at the time of the incident, but I would ballpark it at about 5,000 +/- 2,000 NFS mounts across 28 client machines. FWIW, during the ~48 hour window where we were testing direct NFS instead of nullfs on slowly increasing numbers of machines, no client using direct NFS experienced the kernel panic we're discussing here. (That's without the patch.) Contrast that to 2-3 total panics per day among the machines using nullfs. So it's possible that indirection through nullfs aggravates that particular bug. Alas, based on the above, nullfs seems to be necessary for now. Getting the patch tested & deployed is now top of my list. Thanks!