Re: git: 718d1928f874 - main - LinuxKPI: make linux_alloc_pages() honor __GFP_NORETRY

From: Olivier Certner <olce_at_freebsd.org>
Date: Tue, 25 Mar 2025 18:41:46 UTC
Hi Bjoern,

> What is the drm code in question?  ttm_pool_alloc -> ttm_pool_alloc_page()?
> As all other uses of __GFP_NORETRY in 6.1 (ignoring drm_printf.c) seem to be
> in i915.

Yes, this is indeed targeted at ttm_pool_alloc_page() which tries to allocate contiguous pages to fill up the pool (but not in 5.10).  TTM pools are used by the amdgpu driver to build its translation tables.

Calls to functions other than (linux_)alloc_pages*() are unaffected by the change, and if you dig through all the references of __GFP_NORETRY/GFP_RETRY (including those from files under 'selftests/'), you'll see the built GFP flags are never used with (linux_)alloc_pages*(), except for the only reference you mentioned.
 
> Are you sure?
> 
> i915_gem_object_get_pages_internal() in drm-6.1 at least seems to
> conditionally pass it down:
> 
>       17 #define QUIET (__GFP_NORETRY | __GFP_NOWARN)
>       ...
>       74                         page = alloc_pages(gfp | (order ? QUIET : MAYFAIL),
> 
> Seems it can deal with allocation failures, lowering order or returning
> -ENOMEM from the function so should be fine hopefully.

Yes, I was aware of this piece of code, but obviously it cannot cause any problem.

All calls to Linux's alloc_pages*() can fail *whatever* the passed GFP flags except for GFP_NOFAIL (and that's the only exception).  Callers always have to cope, and specifically when specifying __GFP_NORETRY it would be foolish not too (and that wouldn't be allowed in Linus' tree anyway).

If it wasn't for that, i915_gem_object_get_pages_internal() does the same lowering that ttm_pool_alloc_page() does anyway, as you noticed.

My sentence was indeed too strong, as I was still swapping in context for this work which was done months ago now.  I reviewed all callers not only for GFP_NORETRY but also for most others GFP flags (I have tweaked grep files for all of them and over multiple Linux versions), as I started some work to document what the Linux guarantees/behaviors really are and then some other work to rationalize how we translate them in FreeBSD (there seems to be several possible improvements here).  Unfortunately, I have stalled that last work for weeks now, and probably will for a significant while.

Given Linux's contract on __GFP_NORETRY, it is arguably not reasonable to spend time compacting memory on such calls, that's a deviation from what drivers are supposed to expect.

Oh, and the rest of the commit message also doesn't mention that I also tested this change on machines using the i915 driver, without observing any problem or change in behavior.

Thanks and regards.

-- 
Olivier Certner