l2arc_feed_thread cpu utlization

Sat Feb 15 11:59:43 UTC 2014

on 14/02/2014 22:23 Brendan Gregg said the following:
> G'Day Andriy,
> 
> Thanks for the patch. If most of the data is in one list (anyone have statistics
> to confirm such a likelyhood? I know this happened a lot pre-list-split), then I
> think this means we only scan that at 1/32nd of the previous rate. It should
> solve the CPU issue, but could make warmup very slow.

Brendan,

I do not have any stats, but I think that the data should be spread more or less
evenly between the lists.  I mean the 16 sub-lists for data and 16 sub-lists for
metadata.  First, a list is picked up based on hash and that _should_ produce
more or less even distribution.  Second, if the hash funciton is not good enough
then whole list splitting is pointless.
In either case this was just a quick hack on my part.

> I think the feed algorithm needs to be rethought, although that can be done as
> future work. I'm trying to think of what simple that can be done right now to
> solve CPU usage and warmup rate.

I completely agree with you.  I do not particularly like the fact that the
threshold is per sub-list in FreeBSD.  I would prefer a more "wholisitic" threshold.

> Lets say we keep this change, but in l2arc_write_buffers we maintain an extra
> copy of write_sz, say, list_write_sz, that is reset to zero for each list. Then,
> when we reach headroom and choose to abort, we can check list_write_sz and
> determine how fruitful the scanning has been so far. If that's greater than a
> threshold, then keep scanning, up to the full L2ARC_WRITE_SIZE for that list.
> That way, we've scanned only 1/32nd of the previous length as a test, and only
> if that is fruitful enough do we keep scanning.
> 
> Again, it probably needs to be rethought, but something like that may work fine
> in the meantime.

This sounds interesting.  I will think more about this.
Thanks!

> 
> On Fri, Feb 14, 2014 at 3:52 AM, Andriy Gapon <avg at freebsd.org
> <mailto:avg at freebsd.org>> wrote:
> 
>     on 19/12/2013 13:30 Andriy Gapon said the following:
>     >
>     > This is just a heads up, no patch yet.
>     >
>     > l2arc_feed_thread periodically wakes up and scans certain amount of ARC
>     buffers
>     > and writes eligible buffers to a cache device.
>     > Number of scanned buffers is limited by a threshold on the amount of data
>     in the
>     > buffers seen.  The threshold is applied on a per buffer list basis.  In
>     upstream
>     > there are 4 relevant lists: (data, metadata) X (MFU, MRU).  In FreeBSD each of
>     > the lists was subdivided into 16 lists.  This was done to reduce contention on
>     > the locks that protect the lists.  But as a side effect l2arc_feed_thread can
>     > scan 16 times more data (~ buffers).
>     >
>     > So, if you have a rather large ARC and L2ARC and your buffers tend to be
>     > sufficiently small, then you could observe l2arc_feed_thread burning a
>     > noticeable amount of CPU.  On some of our systems I observed it using up
>     to 40%
>     > of a single core.  Scaling back the threshold by factor of 16 makes CPU
>     > utilization go down by the same factor.
>     >
>     > I plan to commit this change to FreeBSD ZFS code.
>     > Any comments are welcome.
> 
>     Here is what I have in mind:
>     https://github.com/avg-I/freebsd/compare/wip;hc;l2arc_feed_thread_scan_rate
> 
>     The calculations in the macro look somewhat ugly, but they should be correct :-)
> 
>     --
>     Andriy Gapon
> 
>     _______________________________________________
>     freebsd-fs at freebsd.org <mailto:freebsd-fs at freebsd.org> mailing list
>     http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>     To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org
>     <mailto:freebsd-fs-unsubscribe at freebsd.org>"
> 
> 
> 
> 
> -- 
> Brendan Gregg, Joyent                      http://dtrace.org/blogs/brendan

-- 
Andriy Gapon