Re: git: 718d1928f874 - main - LinuxKPI: make linux_alloc_pages() honor __GFP_NORETRY
Date: Tue, 25 Mar 2025 16:57:16 UTC
On Tue, 25 Mar 2025, Olivier Certner wrote: > The branch main has been updated by olce: > > URL: https://cgit.FreeBSD.org/src/commit/?id=718d1928f8748fe4429c011296f94f194d63c695 > > commit 718d1928f8748fe4429c011296f94f194d63c695 > Author: Mathieu <sigsys@gmail.com> > AuthorDate: 2024-11-14 00:24:02 +0000 > Commit: Olivier Certner <olce@FreeBSD.org> > CommitDate: 2025-03-25 08:41:44 +0000 > > LinuxKPI: make linux_alloc_pages() honor __GFP_NORETRY > > This is to fix slowdowns with drm-kmod that get worse over time as > physical memory become more fragmented (and probably also depending on > other factors). > > Based on information posted in this bug report: > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277476 > > By default, linux_alloc_pages() retries failed allocations by calling > vm_page_reclaim_contig() to attempt to free contiguous physical memory > pages. vm_page_reclaim_contig() does not always succeed and calling it > can be very slow even when it fails. When physical memory is very > fragmented, vm_page_reclaim_contig() can end up being called (and > failing) after every allocation attempt. This could cause very > noticeable graphical desktop hangs (which could last seconds). > > The drm-kmod code in question attempts to allocate multiple contiguous > pages at once but does not actually require them to be contiguous. It > can fallback to doing multiple smaller allocations when larger > allocations fail. It passes alloc_pages() the __GFP_NORETRY flag in this > case. What is the drm code in question? ttm_pool_alloc -> ttm_pool_alloc_page()? As all other uses of __GFP_NORETRY in 6.1 (ignoring drm_printf.c) seem to be in i915. > This patch makes linux_alloc_pages() fail early (without retrying) when > this flag is passed. > > [olce: The problem this patch fixes is longer and longer GUI freezes as > a machine's memory gets filled and becomes fragmented, when using amdgpu > from DRM kmod 5.15 and DRM kmod 6.1 (DRM kmod 5.10 is unaffected; newer > Linux kernel introduced an "optimization" by which a pool of pages is > filled preferentially with contiguous pages, which triggered the problem > for us). The original commit message above evokes freezes lasting > seconds, but I occasionally witnessed some lasting tens of minutes, > rendering a machine completely useless. > > The patch has been reviewed for its potential impacts to other LinuxKPI > parts and our existing DRM kmods' code. In particular, there is no > other user of __GFP_NORETRY/GFP_NORETRY with Linux's alloc_pages*() > functions in our tree or DRM kmod ports. Are you sure? i915_gem_object_get_pages_internal() in drm-6.1 at least seems to conditionally pass it down: 17 #define QUIET (__GFP_NORETRY | __GFP_NOWARN) ... 74 page = alloc_pages(gfp | (order ? QUIET : MAYFAIL), Seems it can deal with allocation failures, lowering order or returning -ENOMEM from the function so should be fine hopefully. > It has also been tested extensively, by me for months against 14-STABLE > and sporadically on -CURRENT on a RX580, and by several others as > reported below and as is visible in more details in the quoted bugzilla > PR and in the initial drm-kmod issue at > https://github.com/freebsd/drm-kmod/issues/302, on a variety of other > AMD GPUs (several RX580, RX570, Radeon Pro WX5100, Green Sardine 5600G, > Ryzen 9 4900H with embedded Renoir).] > > PR: 277476 > Reported by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> > Reviewed by: olce > Tested by: many (olce, Pierre Pronchery, Evgenii Khramtsov, chaplina, rk) > MFC after: 2 weeks > Relnotes: yes > Sponsored by: The FreeBSD Foundation (review and part of testing) > --- > sys/compat/linuxkpi/common/include/linux/gfp.h | 4 ++-- > sys/compat/linuxkpi/common/src/linux_page.c | 3 ++- > 2 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/sys/compat/linuxkpi/common/include/linux/gfp.h b/sys/compat/linuxkpi/common/include/linux/gfp.h > index bd8fa1a18372..35dbe3e2a436 100644 > --- a/sys/compat/linuxkpi/common/include/linux/gfp.h > +++ b/sys/compat/linuxkpi/common/include/linux/gfp.h > @@ -43,7 +43,6 @@ > #define __GFP_NOWARN 0 > #define __GFP_HIGHMEM 0 > #define __GFP_ZERO M_ZERO > -#define __GFP_NORETRY 0 > #define __GFP_NOMEMALLOC 0 > #define __GFP_RECLAIM 0 > #define __GFP_RECLAIMABLE 0 > @@ -57,7 +56,8 @@ > #define __GFP_KSWAPD_RECLAIM 0 > #define __GFP_WAIT M_WAITOK > #define __GFP_DMA32 (1U << 24) /* LinuxKPI only */ > -#define __GFP_BITS_SHIFT 25 > +#define __GFP_NORETRY (1U << 25) /* LinuxKPI only */ > +#define __GFP_BITS_SHIFT 26 > #define __GFP_BITS_MASK ((1 << __GFP_BITS_SHIFT) - 1) > #define __GFP_NOFAIL M_WAITOK > > diff --git a/sys/compat/linuxkpi/common/src/linux_page.c b/sys/compat/linuxkpi/common/src/linux_page.c > index bece8c954bfd..b5a0d34a6ad7 100644 > --- a/sys/compat/linuxkpi/common/src/linux_page.c > +++ b/sys/compat/linuxkpi/common/src/linux_page.c > @@ -117,7 +117,8 @@ linux_alloc_pages(gfp_t flags, unsigned int order) > page = vm_page_alloc_noobj_contig(req, npages, 0, pmax, > PAGE_SIZE, 0, VM_MEMATTR_DEFAULT); > if (page == NULL) { > - if (flags & M_WAITOK) { > + if ((flags & (M_WAITOK | __GFP_NORETRY)) == > + M_WAITOK) { > int err = vm_page_reclaim_contig(req, > npages, 0, pmax, PAGE_SIZE, 0); > if (err == ENOMEM) > -- Bjoern A. Zeeb r15:7