[Bug 279901] glibc-2.39-2 and above on the host segfault
- In reply to: bugzilla-noreply_a_freebsd.org: "[Bug 279901] glibc-2.39-2 and above on the host segfault"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 18 Dec 2024 11:24:45 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=279901 --- Comment #39 from Florian Weimer <fweimer@redhat.com> --- (In reply to Konstantin Belousov from comment #37) > Do you see which CPUID leaf causes the trouble? Let me try based on attachment 255708. The maximum leaf is 0x80000023 according to this: x86.processor[0x0].cpuid.eax[0x80000000].eax=0x80000023 Ordinarly, handle_amd in sysdeps/x86/dl-cacheinfo.h would use the modern way for obtaining cache details, using leaf 0x8000001D: x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x0].eax=0x121 x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x0].ebx=0x3f x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x0].ecx=0x0 x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x0].edx=0x0 x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x1].eax=0x143 x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x1].ebx=0x3f x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x1].ecx=0x0 x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x1].edx=0x0 x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x2].eax=0x163 x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x2].ebx=0x3f x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x2].ecx=0x0 x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x2].edx=0x0 x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x3].eax=0x3ffc100 x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x3].ebx=0x0 x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x3].ecx=0x0 x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x3].edx=0x0 x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x3].until_ecx=0x1ff L3 cache data is subleaf 3. We have a safety check that requires ECX != 0, in case hypervisors do not fill in this information, which is happening here. We fall back to the legacy way of obtaining cache size. That uses leaf 0x80000006 for L3 cache information: x86.processor[0x0].cpuid.eax[0x80000006].eax=0x48002200 x86.processor[0x0].cpuid.eax[0x80000006].ebx=0x68004200 x86.processor[0x0].cpuid.eax[0x80000006].ecx=0x2006140 x86.processor[0x0].cpuid.eax[0x80000006].edx=0x8009140 The base L3 cache size is 2 * (EDX & 0x3ffc0000), so 256 MIB. This is not unreasonable for an EPYC system, and it's probably right. However, that number could be a per-socket number, and the way we use this number for tuning, we need a per-thread amount. We adjust this per leaf 0x80000008. The thread count is in (ECX & 0xff) + 1: x86.processor[0x0].cpuid.eax[0x80000008].eax=0x3030 x86.processor[0x0].cpuid.eax[0x80000008].ebx=0x7 x86.processor[0x0].cpuid.eax[0x80000008].ecx=0x0 x86.processor[0x0].cpuid.eax[0x80000008].edx=0x10007 So we get 1, and there is no per-thread scale-down. (I think the hypervisor should expose a more realistic count here?) If the CPU family is at least 0x17, we assume that the number is measured per core complex. And that comes again from leaf 0x8000001d, subleaf 3, but this time register EAX. It's computed as (EAX >> 14 & 0xfff) + 1. This evaluates to 4096 here, and I think this is the bug. This CCX count is just way too high. Based on the available information, the glibc code assumes that there are 4096 instances of 256 MiB caches, which translates to 1 TiB of L3 cache (per thread, but the thread count is 1). -- You are receiving this mail because: You are the assignee for the bug.