From nobody Mon Apr 17 19:59:14 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Q0dLv1m76z457nb for ; Mon, 17 Apr 2023 20:04:55 +0000 (UTC) (envelope-from pjd@FreeBSD.org) Received: from mail.dawidek.net (mail.dawidek.net [94.130.64.56]) by mx1.freebsd.org (Postfix) with ESMTP id 4Q0dLt6yzXz3Qv7 for ; Mon, 17 Apr 2023 20:04:54 +0000 (UTC) (envelope-from pjd@FreeBSD.org) Authentication-Results: mx1.freebsd.org; none Received: from [192.168.250.133] (c-73-241-172-196.hsd1.ca.comcast.net [73.241.172.196]) by mail.dawidek.net (Postfix) with ESMTPSA id 920784F9D3; Mon, 17 Apr 2023 21:59:16 +0200 (CEST) Message-ID: Date: Mon, 17 Apr 2023 12:59:14 -0700 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 Content-Language: en-US To: =?UTF-8?B?Sm9zw6kgUMOpcmV6?= Cc: freebsd-current@freebsd.org References: <20230413071032.18BFF31F@slippy.cwsent.com> <20230413135635.6B62F354@slippy.cwsent.com> <319a267e-3f76-3647-954a-02178c260cea@dawidek.net> <441db213-2abb-b37e-e5b3-481ed3e00f96@dawidek.net> <5ce72375-90db-6d30-9f3b-a741c320b1bf@freebsd.org> <99382FF7-765C-455F-A082-C47DB4D5E2C1@yahoo.com> <32cad878-726c-4562-0971-20d5049c28ad@freebsd.org> <20230415115452.08911bb7@thor.intern.walstatt.dynvpn.de> <20230415143625.99388387@slippy.cwsent.com> <20230415175218.777d0a97@thor.intern.walstatt.dynvpn.de> <6792aded-6e2e-a118-259d-0df0f80c361c@smeets.xyz> <80ea8a67-9b64-c723-6d97-21cfa127ae43@dawidek.net> <01430095-33a3-a949-3772-2ec90b4c3fe6@dawidek.net> <0164e42a-e7cd-a1e8-295c-21f414edf67b@dawidek.net> From: Pawel Jakub Dawidek In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4Q0dLt6yzXz3Qv7 X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:24940, ipnet:94.130.0.0/16, country:DE] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N On 4/17/23 21:28, José Pérez wrote: > Hi Pawel, > thank you for your reply and for the fixes. > > I think there is a 4th issue that needs to be addressed: how do we > recover from the worst case scenario which is a machine with a kernel > > 2a58b312b62f and ZFS root upgraded with block cloning enabled. > > In particular, is it safe to turn such a machine on in the first place, > and what are the risks involved in doing so? Any potential data loss? > > Would such a machine be able to fix itself by compiling a kernel, or > would compilation fail and might data be corrupted in the process? > > I have two poudriere builders powered off (I am not alone in this > situation) and I need to recover them, ideally minimizing data loss. The > builders are also hosting current and used to build kernels and worlds > for 13 and current: as of now all my production machines are stuck on > the 13 they run, I cannot update binaries nor packages and I would like > to be back online. José, I can only speak of block cloning in details, but I'll try to address everything. The easiest way to avoid block_cloning-related corruption on the kernel after the last OpenZFS merge, but before e0bb199925 is to set the compress property to 'off' and the sync property to something other than 'disabled'. This will avoid the block_cloning-related corruption and zil_replaying() panic. As for the other corruption, unfortunately I don't know the details, but my understanding is that it is happening under higher load. Not sure I'd trust a kernel built on a machine with this bug present. What I would do is to compile the kernel as of 068913e4ba somewhere else, boot the problematic machine in single-user mode and install the newly built kernel. As far as I can tell, contrary to some initial reports, none of the problems introduced by the recent OpenZFS merge corrupt the pool metadata, only file's data. You can locate the files modified with the bogus kernel using find(1) with a proper modification time, but you have to decide what to do with them (either throw them away, restore them from backup or inspect them). -- Pawel Jakub Dawidek