[Bug 277476] graphics/drm-515-kmod: amdgpu periodic hangs due to phys contig allocations
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 08 Nov 2024 09:04:51 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277476 --- Comment #5 from sigsys@gmail.com --- Yeah so this problem was super annoying. But thanks to the information already posted here, seems like it wasn't too hard to fix. IIUC the drm code (ttm_pool_alloc()) asking for contiguous pages doesn't actually need contiguous pages. It's just an opportunistic optimization. When allocation fails, it fallsback to asking for less and less contiguous pages (eventually only asking for one page at a time). When ttm_pool_alloc_page() asks for more than one page, it passes alloc_pages() some extra flags (__GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | __GFP_KSWAPD_RECLAIM). What's expensive is the vm_page_reclaim_contig() in linux_alloc_pages(). The function tries too hard to find contiguous memory (that the drm code doesn't even require) and as physical memory gets too fragmented it becomes very slow. So, very simple fix, make linux_alloc_pages() react to one of the flag passed by the drm code: diff --git a/sys/compat/linuxkpi/common/include/linux/gfp.h b/sys/compat/linuxkpi/common/include/linux/gfp.h index 2fcc0dc05f29..58a021086c98 100644 --- a/sys/compat/linuxkpi/common/include/linux/gfp.h +++ b/sys/compat/linuxkpi/common/include/linux/gfp.h @@ -44,7 +44,6 @@ #define __GFP_NOWARN 0 #define __GFP_HIGHMEM 0 #define __GFP_ZERO M_ZERO -#define __GFP_NORETRY 0 #define __GFP_NOMEMALLOC 0 #define __GFP_RECLAIM 0 #define __GFP_RECLAIMABLE 0 @@ -58,7 +57,8 @@ #define __GFP_KSWAPD_RECLAIM 0 #define __GFP_WAIT M_WAITOK #define __GFP_DMA32 (1U << 24) /* LinuxKPI only */ -#define __GFP_BITS_SHIFT 25 +#define __GFP_NORETRY (1U << 25) /* LinuxKPI only */ +#define __GFP_BITS_SHIFT 26 #define __GFP_BITS_MASK ((1 << __GFP_BITS_SHIFT) - 1) #define __GFP_NOFAIL M_WAITOK diff --git a/sys/compat/linuxkpi/common/src/linux_page.c b/sys/compat/linuxkpi/common/src/linux_page.c index 18b90b5e3d73..71a6890a3795 100644 --- a/sys/compat/linuxkpi/common/src/linux_page.c +++ b/sys/compat/linuxkpi/common/src/linux_page.c @@ -118,7 +118,7 @@ linux_alloc_pages(gfp_t flags, unsigned int order) page = vm_page_alloc_noobj_contig(req, npages, 0, pmax, PAGE_SIZE, 0, VM_MEMATTR_DEFAULT); if (page == NULL) { - if (flags & M_WAITOK) { + if ((flags & (M_WAITOK | __GFP_NORETRY)) == M_WAITOK) { int err = vm_page_reclaim_contig(req, npages, 0, pmax, PAGE_SIZE, 0); if (err == ENOMEM) Been working fine here with amdgpu for about 3 weeks. (The drm modules need to be recompiled with the modified kernel header.) -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug.