Low-vnode deadlock
Garrett Wollman
wollman at csail.mit.edu
Fri Mar 20 00:16:25 UTC 2015
As I've previously posted, I've been doing some testing with the SPEC
SFS 2014 benchmark. One of the workloads, SWBUILD, is intended to be
"metadata intensive". While watching it in operation the other day, I
noticed that the vnlru kthread ends up taking a large amount of CPU,
indicating that the system is recycling vnodes at a very high rate.
In previous benchmark runs, I've also found that this workload tends
to deadlock the machine, although I haven't identified exactly how.
Usually this deadlock occurs around a load value ("business metric")
of 40 to 50 in the benchmark, and even when there is no deadlock, the
benchmark run is counted as a failure as the system can't maintain the
required op rate.
As a test, I increased kern.maxvnodes to 20 million. While vnlru
still gets substantial CPU, and the system is thrashing like crazy,
it's still able to successfully complete benchmark runs without either
deadlock or missing the iops target, at least up to a load value of
65. I'm still trying to find the point at which it falls over under
this configuration. The system 5-minute load average peaks over 100
while the benchmark is running. (There are 5 benchmark processes for
each unit of load, but they sleep to maintain the desired operation
rate.)
I will be interested to see how much of an effect this has when I move
from benchmarking the server itself to running benchmarks over NFS,
and the benchmark processes are no longer competing with the rest of
the system for main memory.
Ultimately these results will be published in some forum, but I
haven't figured out exactly where yet.
-GAWollman
More information about the freebsd-fs
mailing list