[PATCH] Convert the VFS cache lock to an rmlock
Alan Cox
alan.l.cox at gmail.com
Thu Mar 12 23:13:01 UTC 2015
Below is partial results from a profile of a parallel (-j7) "buildworld" on
a 6-core machine that I did after the introduction of pmap_advise, so this
is not a new profile. The results are sorted by total waiting time and
only the top 20 entries are listed.
max wait_max total wait_total count avg wait_avg
cnt_hold cnt_lock name
1027 208500 16292932 1658585700 5297163 3 313 0
3313855 kern/vfs_cache.c:629 (rw:Name Cache)
208564 186514 19080891106 1129189627 355575930 53 3 0
1323051 kern/vfs_subr.c:2099 (lockmgr:ufs)
169241 148057 193721142 419075449 13819553 14 30 0
110089 kern/vfs_subr.c:2210 (lockmgr:ufs)
187092 191775 1923061952 257319238 328416784 5 0 0
5106537 kern/vfs_cache.c:488 (rw:Name Cache)
23 114 134681925 220476269 40747213 3 5 0
25679721 kern/kern_clocksource.c:233 (spin mutex:et_hw_mtx)
39069 101543 1931226072 208764524 482193429 4 0 0
22375691 kern/vfs_subr.c:2177 (sleep mutex:vnode interlock)
187131 187056 2509403648 140794697 298324050 8 0 0
14386756 kern/vfs_cache.c:669 (sleep mutex:vnode interlock)
1421 257059 260943147 139520512 104936165 2 1 0
12997640 vm/vm_page.c:1225 (sleep mutex:vm page free queue)
39612 145747 371125327 121005252 136149528 2 0 0
8280782 kern/vfs_subr.c:2134 (sleep mutex:vnode interlock)
1720 249735 226621512 91906907 93436933 2 0 0
7092634 vm/vm_page.c:1770 (sleep mutex:vm active pagequeue)
394155 394200 330090749 86368442 48766123 6 1 0
1169061 kern/vfs_hash.c:78 (sleep mutex:vfs hash)
892 93103 3446633 75923096 1482518 2 51 0
236865 kern/vfs_cache.c:799 (rw:Name Cache)
4030 394151 395521192 63355061 47860319 8 1 0
6439221 kern/vfs_hash.c:86 (sleep mutex:vnode interlock)
4554 147798 247338596 56263926 104192514 2 0 0
9455460 vm/vm_page.c:1948 (sleep mutex:vm page free queue)
2587 230069 219652081 48271335 94011085 2 0 0
9011261 vm/vm_page.c:1729 (sleep mutex:vm active pagequeue)
16420 50195 920083075 38568487 347596869 2 0 0
3035672 kern/vfs_subr.c:2107 (sleep mutex:vnode interlock)
57348 93913 65957615 31956482 2487620 26 12 0
39048 vm/vm_fault.c:672 (rw:vm object)
1798 93694 127847964 28490515 46510308 2 0 0
1897724 kern/vfs_subr.c:419 (sleep mutex:struct mount mtx)
249739 207227 775356648 25501046 95007901 8 0 0
211559 vm/vm_fault.c:918 (sleep mutex:vm page)
452130 157222 70439287 18564724 5429942 12 3 0
10813 vm/vm_map.c:2738 (rw:vm object)
On Thu, Mar 12, 2015 at 12:36 PM, Mateusz Guzik <mjguzik at gmail.com> wrote:
> On Thu, Mar 12, 2015 at 11:14:42AM -0400, Ryan Stone wrote:
> > I've just submitted a patch to Differential[1] for review that converts
> the
> > VFS cache to use an rmlock in place of the current rwlock. My main
> > motivation for the change is to fix a priority inversion problem that I
> saw
> > recently. A real-time priority thread attempted to acquire a write lock
> on
> > the VFS cache lock, but there was already a reader holding it. The
> reader
> > was preempted by a normal priority thread, and my real-time thread was
> > starved.
> >
> > [1] https://reviews.freebsd.org/D2051
> >
> >
> > I was worried about the performance implications of the change, as I
> wasn't
> > sure how common write operations on the VFS cache would be. I did a -j12
> > buildworld/buildkernel test on a 12-core Haswell Xeon system, as I
> figured
> > that would be a reasonable stress test that simultaneously creates lots
> of
> > small files and reads a lot of files as well. This actually wound up
> being
> > about a 10% performance *increase* (the units below are seconds of
> elapsed
> > time as measured by /usr/bin/time, so smaller is better):
> >
> > $ ministat -C 1 orig.log rmlock.log
> > x orig.log
> > + rmlock.log
> >
> +------------------------------------------------------------------------------+
> > | +
> x
> > |
> > |++++ x x
> xxx
> > |
> > | |A|
> > |_________A___M____||
> >
> +------------------------------------------------------------------------------+
> > N Min Max Median Avg
> Stddev
> > x 6 2710.31 2821.35 2816.75 2798.0617
> 43.324817
> > + 5 2488.25 2500.25 2498.04 2495.756
> 5.0494782
> > Difference at 95.0% confidence
> > -302.306 +/- 44.4709
> > -10.8041% +/- 1.58935%
> > (Student's t, pooled s = 32.4674)
> >
> > The one outlier in the rwlock case does confuse me a bit. What I did was
> > booted a freshly-built image with the rmlock lock applied, did a git
> > checkout of head, and then did 5 builds in a row. The git checkout
> should
> > have had the effect of priming the disk cache with the source files.
> Then
> > I installed the stock head kernel, rebooted, and ran 5 more builds (and
> > then 1 more when I noticed the outlier). The fast outlier was the
> *first*
> > run, which should have been running with a cold disk cache, so I really
> > don't know why it would be 90 seconds faster. I do see that this run
> also
> > had about 500-600 fewer seconds spent in system time:
> >
> > x orig.log
> >
> +------------------------------------------------------------------------------+
> > |
> > x |
> > |x x x
> > xx |
> > |
> > |_________________________A__________M_____________||
> >
> +------------------------------------------------------------------------------+
> > N Min Max Median Avg
> Stddev
> > x 6 3515.23 4121.84 4105.57 4001.71
> 239.61362
> >
> > I'm not sure how much that I care, given that the rmlock is universally
> > faster (but maybe I should try the "cold boot" case anyway).
> >
> > If anybody had any comments or further testing that they would like to
> see,
> > please let me know.
>
> Workloads like buildworld and the like (i.e. a lot of forks + execs) run
> into very severe contention in vm, which is orders of magnitude bigger
> than anything else.
>
> As such your result seems quite suspicious.
>
> Can you describe in more detail how were you testing?
>
> Did you have a separate fs for obj tree which was mounted+unmounted
> before each run?
>
> I suggest you grab a machine from zoo[1] and run some tests on "bigger"
> hardware.
>
> A perf improvement, even slight, is definitely welcome.
>
> [1] https://wiki.freebsd.org/TestClusterOneReservations
>
> --
> Mateusz Guzik <mjguzik gmail.com>
> _______________________________________________
> freebsd-current at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"
>
More information about the freebsd-current
mailing list