git: 831e6fb0baf6 - stable/14 - LinuxKPI: make linux_alloc_pages() honor __GFP_NORETRY

From: Olivier Certner <olce_at_FreeBSD.org>
Date: Tue, 08 Apr 2025 13:41:15 UTC
The branch stable/14 has been updated by olce:

URL: https://cgit.FreeBSD.org/src/commit/?id=831e6fb0baf67c2421abb50b6a14da9e71c183bb

commit 831e6fb0baf67c2421abb50b6a14da9e71c183bb
Author:     Mathieu <sigsys@gmail.com>
AuthorDate: 2024-11-14 00:24:02 +0000
Commit:     Olivier Certner <olce@FreeBSD.org>
CommitDate: 2025-04-08 13:38:29 +0000

    LinuxKPI: make linux_alloc_pages() honor __GFP_NORETRY
    
    This is to fix slowdowns with drm-kmod that get worse over time as
    physical memory become more fragmented (and probably also depending on
    other factors).
    
    Based on information posted in this bug report:
    https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277476
    
    By default, linux_alloc_pages() retries failed allocations by calling
    vm_page_reclaim_contig() to attempt to free contiguous physical memory
    pages. vm_page_reclaim_contig() does not always succeed and calling it
    can be very slow even when it fails. When physical memory is very
    fragmented, vm_page_reclaim_contig() can end up being called (and
    failing) after every allocation attempt. This could cause very
    noticeable graphical desktop hangs (which could last seconds).
    
    The drm-kmod code in question attempts to allocate multiple contiguous
    pages at once but does not actually require them to be contiguous. It
    can fallback to doing multiple smaller allocations when larger
    allocations fail. It passes alloc_pages() the __GFP_NORETRY flag in this
    case.
    
    This patch makes linux_alloc_pages() fail early (without retrying) when
    this flag is passed.
    
    [olce: The problem this patch fixes is longer and longer GUI freezes as
    a machine's memory gets filled and becomes fragmented, when using amdgpu
    from DRM kmod 5.15 and DRM kmod 6.1 (DRM kmod 5.10 is unaffected; newer
    Linux kernel introduced an "optimization" by which a pool of pages is
    filled preferentially with contiguous pages, which triggered the problem
    for us).  The original commit message above evokes freezes lasting
    seconds, but I occasionally witnessed some lasting tens of minutes,
    rendering a machine completely useless.
    
    The patch has been reviewed for its potential impacts to other LinuxKPI
    parts and our existing DRM kmods' code.  In particular, there is no
    other user of __GFP_NORETRY/GFP_NORETRY with Linux's alloc_pages*()
    functions in our tree or DRM kmod ports.
    
    It has also been tested extensively, by me for months against 14-STABLE
    and sporadically on -CURRENT on a RX580, and by several others as
    reported below and as is visible in more details in the quoted bugzilla
    PR and in the initial drm-kmod issue at
    https://github.com/freebsd/drm-kmod/issues/302, on a variety of other
    AMD GPUs (several RX580, RX570, Radeon Pro WX5100, Green Sardine 5600G,
    Ryzen 9 4900H with embedded Renoir).]
    
    PR:             277476
    Reported by:    Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
    Reviewed by:    olce
    Tested by:      many (olce, Pierre Pronchery, Evgenii Khramtsov, chaplina, rk)
    MFC after:      2 weeks
    Relnotes:       yes
    Sponsored by:   The FreeBSD Foundation (review and part of testing)
    
    (cherry picked from commit 718d1928f8748fe4429c011296f94f194d63c695)
---
 sys/compat/linuxkpi/common/include/linux/gfp.h | 4 ++--
 sys/compat/linuxkpi/common/src/linux_page.c    | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/sys/compat/linuxkpi/common/include/linux/gfp.h b/sys/compat/linuxkpi/common/include/linux/gfp.h
index e285f8591a3c..a9455a028640 100644
--- a/sys/compat/linuxkpi/common/include/linux/gfp.h
+++ b/sys/compat/linuxkpi/common/include/linux/gfp.h
@@ -44,7 +44,6 @@
 #define	__GFP_NOWARN	0
 #define	__GFP_HIGHMEM	0
 #define	__GFP_ZERO	M_ZERO
-#define	__GFP_NORETRY	0
 #define	__GFP_NOMEMALLOC 0
 #define	__GFP_RECLAIM   0
 #define	__GFP_RECLAIMABLE   0
@@ -58,7 +57,8 @@
 #define	__GFP_KSWAPD_RECLAIM	0
 #define	__GFP_WAIT	M_WAITOK
 #define	__GFP_DMA32	(1U << 24) /* LinuxKPI only */
-#define	__GFP_BITS_SHIFT 25
+#define	__GFP_NORETRY	(1U << 25) /* LinuxKPI only */
+#define	__GFP_BITS_SHIFT 26
 #define	__GFP_BITS_MASK	((1 << __GFP_BITS_SHIFT) - 1)
 #define	__GFP_NOFAIL	M_WAITOK
 
diff --git a/sys/compat/linuxkpi/common/src/linux_page.c b/sys/compat/linuxkpi/common/src/linux_page.c
index ead2f24cf5df..cc7683e3b572 100644
--- a/sys/compat/linuxkpi/common/src/linux_page.c
+++ b/sys/compat/linuxkpi/common/src/linux_page.c
@@ -118,7 +118,8 @@ linux_alloc_pages(gfp_t flags, unsigned int order)
 			page = vm_page_alloc_noobj_contig(req, npages, 0, pmax,
 			    PAGE_SIZE, 0, VM_MEMATTR_DEFAULT);
 			if (page == NULL) {
-				if (flags & M_WAITOK) {
+				if ((flags & (M_WAITOK | __GFP_NORETRY)) ==
+				    M_WAITOK) {
 					if (!vm_page_reclaim_contig(req,
 					    npages, 0, pmax, PAGE_SIZE, 0)) {
 						vm_wait(NULL);