[Bug 279901] glibc-2.39-2 and above on the host segfault

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 04 Mar 2025 16:46:57 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=279901

--- Comment #69 from Mark Peek <mp@FreeBSD.org> ---
Having just received an AMD 7840U I wanted to do a little more research into
this bug and the current patch. Given the cache values I am seeing I believe
the patch needs a small change.

Looking at the cache output from ld.so --list-diagnostics without the patch,
i.e., the current code:
x86.cpu_features.data_cache_size=0x8000
x86.cpu_features.shared_cache_size=0x1000000000
x86.cpu_features.level1_icache_size=0x8000
x86.cpu_features.level1_icache_linesize=0x40
x86.cpu_features.level1_dcache_size=0x8000
x86.cpu_features.level1_dcache_assoc=0x8
x86.cpu_features.level1_dcache_linesize=0x40
x86.cpu_features.level2_cache_size=0x100000
x86.cpu_features.level2_cache_assoc=0x8
x86.cpu_features.level2_cache_linesize=0x40
x86.cpu_features.level3_cache_size=0x1000000000
x86.cpu_features.level3_cache_assoc=0x0
x86.cpu_features.level3_cache_linesize=0x40
x86.cpu_features.level4_cache_size=0x0
x86.cpu_features.cachesize_non_temporal_divisor=0x4

As Florian states in #36, the L3 cache size reporting 1TB is what triggers the
bug in glibc-2.40 (or a patched 2.39).

Applying the patch from https://reviews.freebsd.org/D48187 gives these cache
values:
x86.cpu_features.data_cache_size=0x80
x86.cpu_features.shared_cache_size=0x2
x86.cpu_features.level1_icache_size=0x80
x86.cpu_features.level1_icache_linesize=0x40
x86.cpu_features.level1_dcache_size=0x80
x86.cpu_features.level1_dcache_assoc=0x1
x86.cpu_features.level1_dcache_linesize=0x40
x86.cpu_features.level2_cache_size=0x80
x86.cpu_features.level2_cache_assoc=0x1
x86.cpu_features.level2_cache_linesize=0x40
x86.cpu_features.level3_cache_size=0x2
x86.cpu_features.level3_cache_assoc=0x1
x86.cpu_features.level3_cache_linesize=0x1
x86.cpu_features.level4_cache_size=0x0
x86.cpu_features.cachesize_non_temporal_divisor=0x4

While the guest apps now work, the cache sizes are too small and not realistic.
This is due to cpuid 0x8000001D not being fully implemented.

Looking at the glibc code:
    
https://github.com/bminor/glibc/blob/glibc-2.40/sysdeps/x86/dl-cacheinfo.h#L309

As Florian talks about in #39, the handle_amd() function first looks at cpuid
0x8000001D for the cache information which is not providing all of the
parameters needed to compute the correct cache sizes.  If 0x8000001D is not
available or the returned ecx==0, it falls back to a legacy mechanism. But for
Zen architecture will also look at the 0x8000001D eax for the NumSharingCache.

To get this fallback to work properly I reverted one of the changes in the
proposed patch from Konstantin <https://reviews.freebsd.org/D48187> and only
used:
--- a/sys/amd64/vmm/x86.c
+++ b/sys/amd64/vmm/x86.c
@@ -150,8 +150,6 @@ x86_emulate_cpuid(struct vcpu *vcpu, uint64_t *rax,
uint64_t *rbx,
                                 * pkg_id_shift and other OSes may rely on it.
                                 */
                                width = MIN(0xF, log2(threads * cores));
-                               if (width < 0x4)
-                                       width = 0;
                                logical_cpus = MIN(0xFF, threads * cores - 1);
                                regs[2] = (width << AMDID_COREID_SIZE_SHIFT) |
logical_cpus;
                        }
@@ -256,7 +254,7 @@ x86_emulate_cpuid(struct vcpu *vcpu, uint64_t *rax,
uint64_t *rbx,
                                func = 3;       /* unified cache */
                                break;
                        default:
-                               logical_cpus = 0;
+                               logical_cpus = sockets * threads * cores;
                                level = 0;
                                func = 0;
                                break;

The reverted change will keep 0x8000001D ecx==0 to prevent 0x8000001D use in
handle_amd() while still setting a better value for NumSharingCache for use in
the legacy code path. The reported cache sizes with this change shows:

x86.cpu_features.data_cache_size=0x8000
x86.cpu_features.shared_cache_size=0x2000000
x86.cpu_features.level1_icache_size=0x8000
x86.cpu_features.level1_icache_linesize=0x40
x86.cpu_features.level1_dcache_size=0x8000
x86.cpu_features.level1_dcache_assoc=0x8
x86.cpu_features.level1_dcache_linesize=0x40
x86.cpu_features.level2_cache_size=0x100000
x86.cpu_features.level2_cache_assoc=0x8
x86.cpu_features.level2_cache_linesize=0x40
x86.cpu_features.level3_cache_size=0x2000000
x86.cpu_features.level3_cache_assoc=0x0
x86.cpu_features.level3_cache_linesize=0x40
x86.cpu_features.level4_cache_size=0x0
x86.cpu_features.cachesize_non_temporal_divisor=0x4

These values look more reasonable and fixes the guest issues on my system.

Would like see if this matches what other people are seeing for cache sizes
with this patch and if it resolves any outstanding issues.

-- 
You are receiving this mail because:
You are the assignee for the bug.