Re: swap_pager: cannot allocate bio
- In reply to: Chris Ross : "Re: swap_pager: cannot allocate bio"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 20 Nov 2021 18:23:06 UTC
On Fri, Nov 19, 2021 at 10:35:52PM -0500, Chris Ross wrote: > (Sorry that the subject on this thread may not be relevant any more, but I don’t want to disconnect the thread.) > > > On Nov 15, 2021, at 13:17, Chris Ross <cross+freebsd@distal.com> wrote: > >> On Nov 15, 2021, at 10:08, Andriy Gapon <avg@freebsd.org> wrote: > > > >> Yes, I propose to remove the wait for ARC evictions from arc_lowmem(). > >> > >> Another thing that may help a bit is having a greater "slack" between a threshold where the page daemon starts paging out and a threshold where memory allocations start to wait (via vm_wait_domain). > >> > >> Also, I think that for a long time we had a problem (but not sure if it's still present) where allocations succeeded without waiting until the free memory went below certain threshold M, but once a thread started waiting in vm_wait it would not be woken up until the free memory went above another threshold N. And the problem was that N >> M. In other words, a lot of memory had to be freed (and not grabbed by other threads) before the waiting thread would be woken up. > > > > Thank you both for your inputs. Let me know if you’d like me to try anything, and I’ll kick (reboot) the system and can build a new kernel when you’d like. I did get another procstat -kka out of it this morning, and the system has since gone less responsive, but I assume that new procstat won’t show anything last night’s didn’t. > > I’m still having this issue. I rebooted the machine, fsck’d the disks, and got it running again. Again, it ran for ~50 hours before getting stuck. I got another procstat-kka off of it, let me know if you’d like a copy of it. But, it looks like the active processes are all in arc_wait_for_eviction. A pagedaemon is in a arc_wait_for_eviction under a arc_lowmem, but the python processes that were doing the real work don’t have arc_lowmem in their stacks, just the arc_wait_for_eviction. > > Please let me know if there’s anything I can do to assist in finding a remedy for this. Thank you. Here is a patch which tries to address the proximate cause of the problem. It would be helpful to know if it addresses the deadlocks you're seeing. I tested it lightly by putting a NUMA system under memory pressure using postgres. diff --git a/sys/contrib/openzfs/include/os/freebsd/spl/sys/kmem.h b/sys/contrib/openzfs/include/os/freebsd/spl/sys/kmem.h index dc3b4f5d7877..4792a0b29ecf 100644 --- a/sys/contrib/openzfs/include/os/freebsd/spl/sys/kmem.h +++ b/sys/contrib/openzfs/include/os/freebsd/spl/sys/kmem.h @@ -45,7 +45,7 @@ MALLOC_DECLARE(M_SOLARIS); #define POINTER_INVALIDATE(pp) (*(pp) = (void *)((uintptr_t)(*(pp)) | 0x1)) #define KM_SLEEP M_WAITOK -#define KM_PUSHPAGE M_WAITOK +#define KM_PUSHPAGE (M_WAITOK | M_USE_RESERVE) /* XXXMJ */ #define KM_NOSLEEP M_NOWAIT #define KM_NORMALPRI 0 #define KMC_NODEBUG UMA_ZONE_NODUMP diff --git a/sys/contrib/openzfs/module/zfs/arc.c b/sys/contrib/openzfs/module/zfs/arc.c index 79e2d4381830..50cd45d76c52 100644 --- a/sys/contrib/openzfs/module/zfs/arc.c +++ b/sys/contrib/openzfs/module/zfs/arc.c @@ -4188,11 +4188,13 @@ arc_evict_state(arc_state_t *state, uint64_t spa, uint64_t bytes, * pick up where we left off for each individual sublist, rather * than starting from the tail each time. */ - markers = kmem_zalloc(sizeof (*markers) * num_sublists, KM_SLEEP); + markers = kmem_zalloc(sizeof (*markers) * num_sublists, + KM_SLEEP | KM_PUSHPAGE); for (int i = 0; i < num_sublists; i++) { multilist_sublist_t *mls; - markers[i] = kmem_cache_alloc(hdr_full_cache, KM_SLEEP); + markers[i] = kmem_cache_alloc(hdr_full_cache, + KM_SLEEP | KM_PUSHPAGE); /* * A b_spa of 0 is used to indicate that this header is diff --git a/sys/vm/uma_core.c b/sys/vm/uma_core.c index 7b83d81a423d..3fc7859387e0 100644 --- a/sys/vm/uma_core.c +++ b/sys/vm/uma_core.c @@ -3932,7 +3932,8 @@ keg_fetch_slab(uma_keg_t keg, uma_zone_t zone, int rdomain, const int flags) vm_domainset_iter_policy_ref_init(&di, &keg->uk_dr, &domain, &aflags); } else { - aflags = flags; + aflags = (flags & M_USE_RESERVE) != 0 ? + (flags & ~M_WAITOK) | M_NOWAIT : flags; domain = rdomain; }