Major issues with nfsv4

J David j.david.lists at gmail.com
Thu Jan 14 13:50:48 UTC 2021


On Wed, Dec 16, 2020 at 11:25 PM Rick Macklem <rmacklem at uoguelph.ca> wrote:
> If you can do so when the "Opens" count has gone fairly high,
> please "sysctl vfs.deferred_inact" and let us know what that
> returns.

$ sysctl vfs.deferred_inact
sysctl: unknown oid 'vfs.deferred_inact'
$ sysctl -a vfs | fgrep defer
$

Sorry for the delay in responding to this.  I got my knuckles rapped
for allowing this to happen so much.

It happened just now because some of the "use NFSv4.1" config leaked
out to a production machine, but not all of it. As a result, only the
read-only "job binary" filesystems were mounted with nullfs+nfsv4.1.
So it is unlikely to be related to creating files. Hopefully, that
narrows things down.

$ sudo nfsstat -E -c
[...]
  OpenOwner    Opens  LockOwner    Locks   Delegs  LocalOwn
    37473   303469      0      0      1      0
[...]

"nfscl: never fnd open" continues to appear regularly on
console/dmesg, even at the end of the reboot:

Waiting (max 60 seconds) for system thread `bufspacedaemon-2' to stop... done
Waiting (max 60 seconds) for system thread `bufspacedaemon-5' to stop... done
Waiting (max 60 seconds) for system thread `bufspacedaemon-1' to stop... done
Waiting (max 60 seconds) for system thread `bufspacedaemon-6' to stop... done
All buffers synced.
nfscl: never fnd open
nfscl: never fnd open
nfscl: never fnd open
nfscl: never fnd open
nfscl: never fnd open
nfscl: never fnd open
Uptime: 4d13h59m27s
Rebooting...
cpu_reset: Stopping other CPUs
---<<BOOT>>---

It did not appear 300,000 times, though.  More like a few times a day.

Also, I set up an idle system with the NFSv4.1+nullfs config, as
requested. It has been up for 32 days and appears not to have leaked
anything. But it does also have a fistful of those "nfscl: never fnd
open" messages.

There is also a third system in a test environment with the
nullfs+nfsv4.1 config. That system is up 34 days, has no exhibited
problems, and shows this:

  OpenOwner    Opens  LockOwner    Locks   Delegs  LocalOwn
     342    15098      2      0      0      0

That machine shows one "nfscl: never fnd open" in the dmesg.

A fourth system has the NFSv4.1-no-nullfs config in production with
net.inet.ip.portrange.lowlast tweaked and a limit on simultaneous
jobs.  That system had issues requiring a restart 18 days ago. It also
occasionally gets "nfscl: never fnd open" in the dmesg and has
relatively large Open numbers:

As of right now:
  OpenOwner    Opens  LockOwner    Locks   Delegs  LocalOwn
    23214    46304      0      0      0      0

The "OpenOwner" value on that system seems to swing dramatically,
ranging between 45,000 to 10,000 in just a few minutes. It appears to
correlate well to simultaneous jobs. The "Opens" value goes up and
down a bit, but trends upward over time. However, when I found and
killed one long-running job and unmounted its filesystems, "Opens"
dropped 90% to around 4600. Note there are *no* nullfs mounts on that
system.  So nullfs may not be a necessary component of the problem.

As a next step, I will try to create a fake job that opens a ton of
files.  Then I'll test it on the binary read-only nullfs+nfsv4.1
mounts and on the system that runs nfsv4.1 directly.

Thanks!


More information about the freebsd-fs mailing list