Re: curious crashes when under memory pressure
- In reply to: Chris Torek : "Re: curious crashes when under memory pressure"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 04 Jan 2025 17:29:31 UTC
On Sat, Jan 04, 2025 at 08:27:06AM -0800, Chris Torek wrote: ! On Sat, Jan 4, 2025 at 7:01 AM Peter 'PMc' Much ! <pmc@citylink.dinoex.sub.org> wrote: ! >> I'm swapping to a zfs mirror ! > ! > Well, You shouldn't do that. ! ! Why not? Swapping to a *file* on zfs has obvious issues, but swapping ! to a mirrored swap partition seems like it should be entirely safe. A A "mirrored swap partition" - that would be a zfs volume inside a zfs pool which runs on some vdevs which happen to be mirrored, right? I don't know of zfs itself having any notion of "partitions". It supports volumes, and these have almost all the same features as filesystems: checksumming, compression, txg buffering, logging, snapshoting, etc. So I tend to doubt such being safe. I can't give You logical proof (it's more than ten years since I looked deeper into the zfs source), but my belly feeling says there is so many creepy things going on in the zfs layer nowadays (and very likely a bunch of undiscovered bugs also), that one should avoid such a stack. Also, the idea of paging into zfs got popular about the same time when it got popular to normally not use swap at all, as lots of memory got available. And while running a system with serious paging (into tens of GB) is practical, it is probably not the usecase where we would page into zfs. A zfs vdev is logically just a fixed-length file - aka a raw partition. Then above that thing is the zfs logic, with lots of caches. There is not only the ARC where data must go thru, there is other dbuf handling, there is more handling on the vdev layer, and all of that needs some memory. (I looked into these various buffers when I patched things so zfs gets a bit more NUMA-friendly - many of them use the UMA allocator scheme, which again has it's own mechanics.) Then above all this memory consuming stuff comes finally the kernel that wants to pageout, and would expect the pageout going directly onto a fixed-length file, aka a raw partition. That doesn't look very sane to me, so what I am saying is: before you spend time hunting this bug, give it a try with direct raw-partition paging. At least then we know if it happens there also, or not - and that helps narrowing the search. ! bit slow (double writes) but I spent $ on RAM rather than M.2 drives ! on the theory that I can add those later as needed. It doesn't need superfast SSD, at least not for testing. Pageout happens async, and while pagein stalls the concerned process, it is read, and read should be faster. cheerio, PMc