curious crashes when under memory pressure

From: Chris Torek <chris.torek_at_gmail.com>
Date: Sat, 04 Jan 2025 13:35:41 UTC
I have my (amd64, -current) box set up to build a lot of ports in
parallel with a fairly high `make -j` value as well. This will
sometimes try to build llvm versions 15, 16, and 17 and maybe a gcc or
two and/or rustc and/or firefox etc and push the load over 100 and run
me out of real memory (currently only 64 GB). When this happens, it's
not unusual to get:

    pid <pid> (c++) ... exited on signal 4 (core dumped)

messages on the console, occasional uprintf() messages, and a build
failure -- which goes away when retrying. It's not a parallel make
jobs issue. It *appears* to have something to do with the copyout()
calls for signal handlers failing, and it invariably coincides with
increasing swap usage. I'm swapping to a zfs mirror so presumably
there aren't any issues with data corruption here (and I've run memory
tests on the box as well).

I can't really make heads or tails of the problem so far but it seems
to have occurred for other people in the past, with earlier FreeBSD
versions, so it suggests some kind of longstanding issue.

This could be a total red herring, but while I was staring at the
assembly code, I noticed some ifdef SMP "lock" prefixes in {f,s}uword
and the cmpxchg instructions, and this prompts me to mention something
I discovered the hard way back in the mid-2010s on Haswell processors:
the LOCK CMPXCHG16B instruction fails to hold locks if an address
splits across a page boundary. (The guy who wrote our memory allocator
aligned to a 4 byte boundary instead of an 8 byte boundary and we were
using atomic ops on allocated data structures to build "lockless"
queues. I spent a couple of weeks tracking down the crash to this
particular problem.) So I wonder whether these routines should first
check that the addresses are properly aligned, and return EFAULT if
not.

Once I understand how the memmove macros work I'll think about this more. :-)

Chris