From nobody Sat Jan 04 17:29:31 2025 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4YQSKq2k1Rz5j4j1 for ; Sat, 04 Jan 2025 17:36:31 +0000 (UTC) (envelope-from pmc@citylink.dinoex.sub.org) Received: from uucp.dinoex.org (uucp.dinoex.org [IPv6:2a0b:f840::12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "uucp.dinoex.sub.de", Issuer "R11" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4YQSKn3vrCz4qrC for ; Sat, 4 Jan 2025 17:36:29 +0000 (UTC) (envelope-from pmc@citylink.dinoex.sub.org) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of pmc@citylink.dinoex.sub.org designates 2a0b:f840::12 as permitted sender) smtp.mailfrom=pmc@citylink.dinoex.sub.org; dmarc=none; arc=pass ("uucp.dinoex.org:s=M20221114:i=1") Received: from uucp.dinoex.org (uucp.dinoex.org [IPv6:2a0b:f840:0:0:0:0:0:12]) by uucp.dinoex.org (8.18.1/8.18.1) with ESMTPS id 504Ha8kN038814 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Sat, 4 Jan 2025 18:36:09 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) ARC-Seal: i=1; a=rsa-sha256; d=uucp.dinoex.org; s=M20221114; t=1736012171; cv=none; b=sRDlY0oGFh1ml+W9OMI1MraCuYkWclSHfacUV3EftWTzY7w5D9hEbytdTHe71rbUy/RIeEO6W6p0oaLQg4HhrUVA4qtO7NQ84zuoCdqidU/r/EjD3SBZOoeSSd9DSaWxUfo5MndDTDu6t/3t16bKZGnGwhoRO5kdLwnc6kOx1Y8= ARC-Message-Signature: i=1; a=rsa-sha256; d=uucp.dinoex.org; s=M20221114; t=1736012171; c=relaxed/simple; bh=hhHC0gK9I0vMwKgnkLg03Zg6VdGGIMiXxGGfiMHcNRo=; h=Received:Received:Received:Received:X-Authentication-Warning:Date: From:To:Cc:Subject:Message-ID:References:MIME-Version:Content-Type: Content-Disposition:Content-Transfer-Encoding:In-Reply-To:X-Milter: X-Greylist; b=E7xIb9b/0mlxqT79DqwWHjUg5oX2ov3nSqsXlsoeKQ2k5wX32c+qlN3UzMwBMfR0sbjpiAtoO1pDJMHdy2ObxdHr588p9V7uzJmyghkvtDtuNa2Ofd06MEKodq1t/Y1rzWN/R9CL5GkeBtujh3ROk2VJk4dq5QLh0xAZerFEtAQ= ARC-Authentication-Results: i=1; uucp.dinoex.org Received: (from uucp@localhost) by uucp.dinoex.org (8.18.1/8.18.1/Submit) with UUCP id 504Ha8cN038813; Sat, 4 Jan 2025 18:36:08 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: from disp.intra.daemon.contact (disp-e.intra.daemon.contact [IPv6:fd00:0:0:0:0:0:0:112]) by admn.intra.daemon.contact (8.18.1/8.18.1) with ESMTPS id 504HU02m041291 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=OK); Sat, 4 Jan 2025 18:30:01 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: from disp.intra.daemon.contact (localhost [127.0.0.1]) by disp.intra.daemon.contact (8.18.1/8.18.1) with ESMTPS id 504HTVO4073054 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Sat, 4 Jan 2025 18:29:32 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: (from pmc@localhost) by disp.intra.daemon.contact (8.18.1/8.18.1/Submit) id 504HTVxs073053; Sat, 4 Jan 2025 18:29:31 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) X-Authentication-Warning: disp.intra.daemon.contact: pmc set sender to pmc@citylink.dinoex.sub.org using -f Date: Sat, 4 Jan 2025 18:29:31 +0100 From: "Peter 'PMc' Much" To: Chris Torek Cc: freebsd-hackers@freebsd.org Subject: Re: curious crashes when under memory pressure Message-ID: References: List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: X-Milter: Spamilter (Reciever: uucp.dinoex.org; Sender-ip: 0:0:2a0b:f840::; Sender-helo: uucp.dinoex.org;) X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (uucp.dinoex.org [IPv6:2a0b:f840:0:0:0:0:0:12]); Sat, 04 Jan 2025 18:36:11 +0100 (CET) X-Rspamd-Queue-Id: 4YQSKn3vrCz4qrC X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.30 / 15.00]; ARC_ALLOW(-1.00)[uucp.dinoex.org:s=M20221114:i=1]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; TAGGED_RCPT(0.00)[]; ASN(0.00)[asn:205376, ipnet:2a0b:f840::/32, country:DE]; RCPT_COUNT_TWO(0.00)[2]; MIME_TRACE(0.00)[0:+]; MISSING_XM_UA(0.00)[]; FREEMAIL_TO(0.00)[gmail.com]; R_DKIM_NA(0.00)[]; MLMMJ_DEST(0.00)[freebsd-hackers@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; RCVD_COUNT_FIVE(0.00)[5]; RCVD_TLS_LAST(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DMARC_NA(0.00)[sub.org]; HAS_XAW(0.00)[] On Sat, Jan 04, 2025 at 08:27:06AM -0800, Chris Torek wrote: ! On Sat, Jan 4, 2025 at 7:01=E2=80=AFAM Peter 'PMc' Much ! wrote: ! >> I'm swapping to a zfs mirror ! > ! > Well, You shouldn't do that. !=20 ! Why not? Swapping to a *file* on zfs has obvious issues, but swapping ! to a mirrored swap partition seems like it should be entirely safe. A A "mirrored swap partition" - that would be a zfs volume inside a zfs pool which runs on some vdevs which happen to be mirrored, right? I don't know of zfs itself having any notion of "partitions". It supports volumes, and these have almost all the same features as filesystems: checksumming, compression, txg buffering, logging, snapshoting, etc. So I tend to doubt such being safe. I can't give You logical proof (it's more than ten years since I looked deeper into the zfs source), but my belly feeling says there is so many creepy things going on in the zfs layer nowadays (and very likely a bunch of undiscovered bugs also), that one should avoid such a stack. Also, the idea of paging into zfs got popular about the same time when it got popular to normally not use swap at all, as lots of memory got available. And while running a system with serious paging (into tens of GB) is practical, it is probably not the usecase where we would page into zfs. A zfs vdev is logically just a fixed-length file - aka a raw partition. Then above that thing is the zfs logic, with lots of caches. There is not only the ARC where data must go thru, there is other dbuf handling, there is more handling on the vdev layer, and all of that needs some memory. (I looked into these various buffers when I patched things so zfs gets a bit more NUMA-friendly - many of them use the UMA allocator scheme, which again has it's own mechanics.) Then above all this memory consuming stuff comes finally the kernel that wants to pageout, and would expect the pageout going directly onto a fixed-length file, aka a raw partition. That doesn't look very sane to me, so what I am saying is: before you spend time hunting this bug, give it a try with direct raw-partition paging. At least then we know if it happens there also, or not - and that helps narrowing the search. ! bit slow (double writes) but I spent $ on RAM rather than M.2 drives ! on the theory that I can add those later as needed. It doesn't need superfast SSD, at least not for testing. Pageout happens async, and while pagein stalls the concerned process, it is read, and read should be faster. cheerio, PMc