From nobody Sat Nov 20 18:23:06 2021 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 23E5F189EB8C for ; Sat, 20 Nov 2021 18:23:09 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qk1-x734.google.com (mail-qk1-x734.google.com [IPv6:2607:f8b0:4864:20::734]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HxMNF0Cwzz4fCb; Sat, 20 Nov 2021 18:23:09 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qk1-x734.google.com with SMTP id d2so13618584qki.12; Sat, 20 Nov 2021 10:23:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=bEQMQXe5zhJAwtp94Q/gJ0R+5Rj0VrvSTnRNY+GH+K4=; b=ekcPx3stS+HoxDtnY/x0D/DnbMF8bY3Zn9W1E9XXUn4MprPfg5WlBxAmBcBWgZG7G0 kzzzyuz4aL/09zIGjCEJThLRsduF5szmE+UHpEfChsQMBCK9x7fu0KfoVMpIixvbWvH0 O6WOwtf15h6QoP7qJeAfMFPFtFL9BsTuBfrc441Zw5PmtNff9lpy+ISlJLw9pzaEiUqb jqGd+dbNYD4KW4P7NtLQPi6ml1Hg+IB8hnjnUmYPM7+0ZV6AgAmRsFg6a7jWZUSPqEvN dI2XbheIKWwfwQJk3vbSej6Pyl/VHnVmZsjIqAEa6D+IhYWuDnTydFSxjICN+cpnHQSD frVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition :content-transfer-encoding:in-reply-to; bh=bEQMQXe5zhJAwtp94Q/gJ0R+5Rj0VrvSTnRNY+GH+K4=; b=DXqrRlBS9lIl2MPH5NuPCr5GA2IDuelsRtAGvZPhiDRWpcR/NLEPzOPTEaFTfg43wu gXJVurNHY/HNa2ZRO50HekKnJIDW+d+F0eGW6GgltIRuJdtq3vyM/YRs6kvwRCMiVbLJ UsItv9PVlCExMYWg5wD19db903AtQfZUqzOrmElOpfP8T91n4rjGJnfDkBMQK5uUw1h3 n0lG4eGndoJURrA69jl3gjoD93O2qeY7xVJbEr9Ixq8WgkXa2wXRomALcB04OVsH/wYe NyMOzC/clLHouPJgFVRw0THmeVoDoH+yB53NKmi2Omkg4AIcCkP+U2VU+oePpDPdDaos sskA== X-Gm-Message-State: AOAM5331LO69CcKUlM3fI5jtCNw/4o1FIGqs3E8a0xkxWQ7nVV/aMdYU QUYP1Gx1PRUIJ8h7IAv8wcgj+llgQDM= X-Google-Smtp-Source: ABdhPJzkf3tVN7f/0BnRrcy/gYmW7s6c8p9Cnu20Kykc/CTJq/sozMWTW2eQRIhFJzx7EUWmR9ZfMA== X-Received: by 2002:a37:6d3:: with SMTP id 202mr37329240qkg.16.1637432588552; Sat, 20 Nov 2021 10:23:08 -0800 (PST) Received: from nuc ([142.126.186.191]) by smtp.gmail.com with ESMTPSA id ay36sm1846998qkb.60.2021.11.20.10.23.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 20 Nov 2021 10:23:08 -0800 (PST) Date: Sat, 20 Nov 2021 13:23:06 -0500 From: Mark Johnston To: Chris Ross Cc: Andriy Gapon , freebsd-fs Subject: Re: swap_pager: cannot allocate bio Message-ID: References: <4E5511DF-B163-4928-9CC3-22755683999E@distal.com> <19A3AAF6-149B-4A3C-8C27-4CFF22382014@distal.com> <6DA63618-F0E9-48EC-AB57-3C3C102BC0C0@distal.com> <35c14795-3b1c-9315-8e9b-a8dfad575a04@FreeBSD.org> <471B80F4-B8F4-4D5A-9DEB-3F1E00F42A68@distal.com> List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 4HxMNF0Cwzz4fCb X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; TAGGED_RCPT(0.00)[freebsd]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Fri, Nov 19, 2021 at 10:35:52PM -0500, Chris Ross wrote: > (Sorry that the subject on this thread may not be relevant any more, but I don’t want to disconnect the thread.) > > > On Nov 15, 2021, at 13:17, Chris Ross wrote: > >> On Nov 15, 2021, at 10:08, Andriy Gapon wrote: > > > >> Yes, I propose to remove the wait for ARC evictions from arc_lowmem(). > >> > >> Another thing that may help a bit is having a greater "slack" between a threshold where the page daemon starts paging out and a threshold where memory allocations start to wait (via vm_wait_domain). > >> > >> Also, I think that for a long time we had a problem (but not sure if it's still present) where allocations succeeded without waiting until the free memory went below certain threshold M, but once a thread started waiting in vm_wait it would not be woken up until the free memory went above another threshold N. And the problem was that N >> M. In other words, a lot of memory had to be freed (and not grabbed by other threads) before the waiting thread would be woken up. > > > > Thank you both for your inputs. Let me know if you’d like me to try anything, and I’ll kick (reboot) the system and can build a new kernel when you’d like. I did get another procstat -kka out of it this morning, and the system has since gone less responsive, but I assume that new procstat won’t show anything last night’s didn’t. > > I’m still having this issue. I rebooted the machine, fsck’d the disks, and got it running again. Again, it ran for ~50 hours before getting stuck. I got another procstat-kka off of it, let me know if you’d like a copy of it. But, it looks like the active processes are all in arc_wait_for_eviction. A pagedaemon is in a arc_wait_for_eviction under a arc_lowmem, but the python processes that were doing the real work don’t have arc_lowmem in their stacks, just the arc_wait_for_eviction. > > Please let me know if there’s anything I can do to assist in finding a remedy for this. Thank you. Here is a patch which tries to address the proximate cause of the problem. It would be helpful to know if it addresses the deadlocks you're seeing. I tested it lightly by putting a NUMA system under memory pressure using postgres. diff --git a/sys/contrib/openzfs/include/os/freebsd/spl/sys/kmem.h b/sys/contrib/openzfs/include/os/freebsd/spl/sys/kmem.h index dc3b4f5d7877..4792a0b29ecf 100644 --- a/sys/contrib/openzfs/include/os/freebsd/spl/sys/kmem.h +++ b/sys/contrib/openzfs/include/os/freebsd/spl/sys/kmem.h @@ -45,7 +45,7 @@ MALLOC_DECLARE(M_SOLARIS); #define POINTER_INVALIDATE(pp) (*(pp) = (void *)((uintptr_t)(*(pp)) | 0x1)) #define KM_SLEEP M_WAITOK -#define KM_PUSHPAGE M_WAITOK +#define KM_PUSHPAGE (M_WAITOK | M_USE_RESERVE) /* XXXMJ */ #define KM_NOSLEEP M_NOWAIT #define KM_NORMALPRI 0 #define KMC_NODEBUG UMA_ZONE_NODUMP diff --git a/sys/contrib/openzfs/module/zfs/arc.c b/sys/contrib/openzfs/module/zfs/arc.c index 79e2d4381830..50cd45d76c52 100644 --- a/sys/contrib/openzfs/module/zfs/arc.c +++ b/sys/contrib/openzfs/module/zfs/arc.c @@ -4188,11 +4188,13 @@ arc_evict_state(arc_state_t *state, uint64_t spa, uint64_t bytes, * pick up where we left off for each individual sublist, rather * than starting from the tail each time. */ - markers = kmem_zalloc(sizeof (*markers) * num_sublists, KM_SLEEP); + markers = kmem_zalloc(sizeof (*markers) * num_sublists, + KM_SLEEP | KM_PUSHPAGE); for (int i = 0; i < num_sublists; i++) { multilist_sublist_t *mls; - markers[i] = kmem_cache_alloc(hdr_full_cache, KM_SLEEP); + markers[i] = kmem_cache_alloc(hdr_full_cache, + KM_SLEEP | KM_PUSHPAGE); /* * A b_spa of 0 is used to indicate that this header is diff --git a/sys/vm/uma_core.c b/sys/vm/uma_core.c index 7b83d81a423d..3fc7859387e0 100644 --- a/sys/vm/uma_core.c +++ b/sys/vm/uma_core.c @@ -3932,7 +3932,8 @@ keg_fetch_slab(uma_keg_t keg, uma_zone_t zone, int rdomain, const int flags) vm_domainset_iter_policy_ref_init(&di, &keg->uk_dr, &domain, &aflags); } else { - aflags = flags; + aflags = (flags & M_USE_RESERVE) != 0 ? + (flags & ~M_WAITOK) | M_NOWAIT : flags; domain = rdomain; }