[Bug 187594] [zfs] [patch] ZFS ARC behavior problem and fix

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Fri Jun 20 21:18:34 UTC 2014


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594

karl at denninger.net changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |karl at denninger.net

--- Comment #20 from karl at denninger.net ---
No, because memory in "cache" is subject to being either reallocated or freed. 
When I was developing this patch that was my first impression as well and how I
originally coded it, and it turned out to be wrong.

The issue here is that you have two parts of the system contending for RAM --
the VM system generally, and the ARC cache.  If the ARC cache frees space
before the VM system activates and starts pruning then you wind up with the ARC
pinned at the minimum after some period of time, because it releases "early." 

The original ZFS code releases ARC only when the VM system goes into
"desperation" mode.  That's too late and results in pathological behavior
including long freezes where nothing appears to happen at all.  What appears to
actually be happening is that the ARC is essentially dumped while paging is
occurring, and the system reacts very badly to that.

The test as it sits now activates the ARC pare-down at the point the VM system
wakes up.  The two go into and out of contention at roughly the same time
resulting in a balanced result -- the ARC stabilizes at a value allowing some
cached pages to remain, but cached pages do not grow without boundary nor does
the system get into a page starvation situation and get into the "freeze"
condition trying to free huge chunks of ARC at once.

If you have a need to bias the ARC pare-down more-aggressively you can through
the tunables, but the existing code is where after much experimentation across
multiple workloads and RAM sizes was found to result in both a stable ARC and
stable cache page population over long periods of time (weeks of uptime across
varying loads.)

As currently implemented this has now been running untouched for several months
on an extremely busy web, database (Postgresql) and internal Samba server
without incident.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-fs mailing list