curious crashes when under memory pressure
- Reply: Peter 'PMc' Much: "Re: curious crashes when under memory pressure"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 04 Jan 2025 13:35:41 UTC
I have my (amd64, -current) box set up to build a lot of ports in parallel with a fairly high `make -j` value as well. This will sometimes try to build llvm versions 15, 16, and 17 and maybe a gcc or two and/or rustc and/or firefox etc and push the load over 100 and run me out of real memory (currently only 64 GB). When this happens, it's not unusual to get: pid <pid> (c++) ... exited on signal 4 (core dumped) messages on the console, occasional uprintf() messages, and a build failure -- which goes away when retrying. It's not a parallel make jobs issue. It *appears* to have something to do with the copyout() calls for signal handlers failing, and it invariably coincides with increasing swap usage. I'm swapping to a zfs mirror so presumably there aren't any issues with data corruption here (and I've run memory tests on the box as well). I can't really make heads or tails of the problem so far but it seems to have occurred for other people in the past, with earlier FreeBSD versions, so it suggests some kind of longstanding issue. This could be a total red herring, but while I was staring at the assembly code, I noticed some ifdef SMP "lock" prefixes in {f,s}uword and the cmpxchg instructions, and this prompts me to mention something I discovered the hard way back in the mid-2010s on Haswell processors: the LOCK CMPXCHG16B instruction fails to hold locks if an address splits across a page boundary. (The guy who wrote our memory allocator aligned to a 4 byte boundary instead of an 8 byte boundary and we were using atomic ops on allocated data structures to build "lockless" queues. I spent a couple of weeks tracking down the crash to this particular problem.) So I wonder whether these routines should first check that the addresses are properly aligned, and return EFAULT if not. Once I understand how the memmove macros work I'll think about this more. :-) Chris