Strange ARC/Swap/CPU on yesterday's -CURRENT
Don Lewis
truckman at FreeBSD.org
Fri Apr 6 17:33:38 UTC 2018
On 4 Apr, Mark Johnston wrote:
> On Tue, Apr 03, 2018 at 09:42:48PM -0700, Don Lewis wrote:
>> On 3 Apr, Don Lewis wrote:
>> > I reconfigured my Ryzen box to be more similar to my default package
>> > builder by disabling SMT and half of the RAM, to limit it to 8 cores
>> > and 32 GB and then started bisecting to try to track down the problem.
>> > For each test, I first filled ARC by tarring /usr/ports/distfiles to
>> > /dev/null. The commit range that I was searching was r329844 to
>> > r331716. I narrowed the range to r329844 to r329904. With r329904
>> > and newer, ARC is totally unresponsive to memory pressure and the
>> > machine pages heavily. I see ARC sizes of 28-29GB and 30GB of wired
>> > RAM, so there is not much leftover for getting useful work done. Active
>> > memory and free memory both hover under 1GB each. Looking at the
>> > commit logs over this range, the most likely culprit is:
>> >
>> > r329882 | jeff | 2018-02-23 14:51:51 -0800 (Fri, 23 Feb 2018) | 13 lines
>> >
>> > Add a generic Proportional Integral Derivative (PID) controller algorithm and
>> > use it to regulate page daemon output.
>> >
>> > This provides much smoother and more responsive page daemon output, anticipating
>> > demand and avoiding pageout stalls by increasing the number of pages to match
>> > the workload. This is a reimplementation of work done by myself and mlaier at
>> > Isilon.
>> >
>> >
>> > It is quite possible that the recent fixes to the PID controller will
>> > fix the problem. Not that r329844 was trouble free ... I left tar
>> > running over lunchtime to fill ARC and the OOM killer nuked top, tar,
>> > ntpd, both of my ssh sessions into the machine, and multiple instances
>> > of getty while I was away. I was able to log in again and successfully
>> > run poudriere, and ARC did respond to the memory pressure and cranked
>> > itself down to about 5 GB by the end of the run. I did not see the same
>> > problem with tar when I did the same with r329904.
>>
>> I just tried r331966 and see no improvement. No OOM process kills
>> during the tar run to fill ARC, but with ARC filled, the machine is
>> thrashing itself at the start of the poudriere run while trying to build
>> ports-mgmt/pkg (39 minutes so far). ARC appears to be unresponsive to
>> memory demand. I've seen no decrease in ARC size or wired memory since
>> starting poudriere.
>
> Re-reading the ARC reclaim code, I see a couple of issues which might be
> at the root of the behaviour you're seeing.
>
> 1. zfs_arc_free_target is too low now. It is initialized to the page
> daemon wakeup threshold, which is slightly above v_free_min. With the
> PID controller, the page daemon uses a setpoint of v_free_target.
> Moreover, it now wakes up regularly rather than having wakeups be
> synchronized by a mutex, so it will respond quickly if the free page
> count dips below v_free_target. The free page count will dip below
> zfs_arc_free_target only in the face of sudden and extreme memory
> pressure now, so the FMT_LOTSFREE case probably isn't getting
> exercised. Try initializing zfs_arc_free_target to v_free_target.
>
> 2. In the inactive queue scan, we used to compute the shortage after
> running uma_reclaim() and the lowmem handlers (which includes a
> synchronous call to arc_lowmem()). Now it's computed before, so we're
> not taking into account the pages that get freed by the ARC and UMA.
> The following rather hacky patch may help. I note that the lowmem
> logic is now somewhat broken when multiple NUMA domains are
> configured, however, since it fires only when domain 0 has a free
> page shortage.
>
> Index: sys/vm/vm_pageout.c
> ===================================================================
> --- sys/vm/vm_pageout.c (revision 331933)
> +++ sys/vm/vm_pageout.c (working copy)
> @@ -1114,25 +1114,6 @@
> boolean_t queue_locked;
>
> /*
> - * If we need to reclaim memory ask kernel caches to return
> - * some. We rate limit to avoid thrashing.
> - */
> - if (vmd == VM_DOMAIN(0) && pass > 0 &&
> - (time_uptime - lowmem_uptime) >= lowmem_period) {
> - /*
> - * Decrease registered cache sizes.
> - */
> - SDT_PROBE0(vm, , , vm__lowmem_scan);
> - EVENTHANDLER_INVOKE(vm_lowmem, VM_LOW_PAGES);
> - /*
> - * We do this explicitly after the caches have been
> - * drained above.
> - */
> - uma_reclaim();
> - lowmem_uptime = time_uptime;
> - }
> -
> - /*
> * The addl_page_shortage is the number of temporarily
> * stuck pages in the inactive queue. In other words, the
> * number of pages from the inactive count that should be
> @@ -1824,6 +1805,26 @@
> atomic_store_int(&vmd->vmd_pageout_wanted, 1);
>
> /*
> + * If we need to reclaim memory ask kernel caches to return
> + * some. We rate limit to avoid thrashing.
> + */
> + if (vmd == VM_DOMAIN(0) &&
> + vmd->vmd_free_count < vmd->vmd_free_target &&
> + (time_uptime - lowmem_uptime) >= lowmem_period) {
> + /*
> + * Decrease registered cache sizes.
> + */
> + SDT_PROBE0(vm, , , vm__lowmem_scan);
> + EVENTHANDLER_INVOKE(vm_lowmem, VM_LOW_PAGES);
> + /*
> + * We do this explicitly after the caches have been
> + * drained above.
> + */
> + uma_reclaim();
> + lowmem_uptime = time_uptime;
> + }
> +
> + /*
> * Use the controller to calculate how many pages to free in
> * this interval.
> */
More information about the freebsd-current
mailing list