From nobody Tue Apr 18 16:59:03 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Q19BG0pHRz45MLP for ; Tue, 18 Apr 2023 16:59:18 +0000 (UTC) (envelope-from fbl@aoek.com) Received: from mail.yourbox.net (mail.yourbox.net [IPv6:2001:41d0:1:767d::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mail.yourbox.net", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Q19BF5HyQz3Ddd; Tue, 18 Apr 2023 16:59:17 +0000 (UTC) (envelope-from fbl@aoek.com) Authentication-Results: mx1.freebsd.org; none Received: from mail.yourbox.net (localhost [IPv6:0:0:0:0:0:0:0:1]) by mail.yourbox.net (8.17.1/8.17.1) with ESMTP id 33IGx8av084746; Tue, 18 Apr 2023 18:59:08 +0200 (CEST) (envelope-from fbl@aoek.com) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=aoek.com; s=mailbox; t=1681837148; bh=JsfkixIelPeb+g8s7GG+at0QAnkbuW26OmTXLSmZbfs=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=QgtHDbWn52EYcfZ7gf2cd0U/7+tqWGklu2Pd6e6VjXq+BadG+qC2BN4+lyVWdHb4d Se1AA3rGDaKtSfR/Whr50JJZTpdVELfBKKY3vUrhLyrSsJcwP8bIjoI17jwZrwMYIr sH3DWvJIAGjvyIPgcGxO/s9iQU3rnL/L7QLmFSckflhvz8ClnCjMVEGrpiVXy74lK/ mXT4SSLwn/JPKVM6Q5ytZalu17sWGlk1Jfho7w5JnLuTPtGvg60J9z7OU3ysS+Sozc gZPUYwww41JjFnTzCcMEO6d2GLwD0VazXdPvLaJwi/0Hg0bgqEl6vOej8O3r54ram+ 6SVzGmSFl31GA== List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Tue, 18 Apr 2023 18:59:03 +0200 From: =?UTF-8?Q?Jos=C3=A9_P=C3=A9rez?= To: Pawel Jakub Dawidek Cc: freebsd-current@freebsd.org Subject: Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 In-Reply-To: References: <20230413071032.18BFF31F@slippy.cwsent.com> <20230413135635.6B62F354@slippy.cwsent.com> <319a267e-3f76-3647-954a-02178c260cea@dawidek.net> <441db213-2abb-b37e-e5b3-481ed3e00f96@dawidek.net> <5ce72375-90db-6d30-9f3b-a741c320b1bf@freebsd.org> <99382FF7-765C-455F-A082-C47DB4D5E2C1@yahoo.com> <32cad878-726c-4562-0971-20d5049c28ad@freebsd.org> <20230415115452.08911bb7@thor.intern.walstatt.dynvpn.de> <20230415143625.99388387@slippy.cwsent.com> <20230415175218.777d0a97@thor.intern.walstatt.dynvpn.de> <6792aded-6e2e-a118-259d-0df0f80c361c@smeets.xyz> <80ea8a67-9b64-c723-6d97-21cfa127ae43@dawidek.net> <01430095-33a3-a949-3772-2ec90b4c3fe6@dawidek.net> <0164e42a-e7cd-a1e8-295c-21f414edf67b@dawidek.net> Message-ID: <4ab79579555b34317e9210d5e9f52832@mail.yourbox.net> X-Sender: fbl@aoek.com User-Agent: Roundcube Webmail/1.2.0 X-Rspamd-Queue-Id: 4Q19BF5HyQz3Ddd X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:16276, ipnet:2001:41d0::/32, country:FR] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N El 2023-04-17 21:59, Pawel Jakub Dawidek escribió: > José, > > I can only speak of block cloning in details, but I'll try to address > everything. > > The easiest way to avoid block_cloning-related corruption on the > kernel after the last OpenZFS merge, but before e0bb199925 is to set > the compress property to 'off' and the sync property to something > other than 'disabled'. This will avoid the block_cloning-related > corruption and zil_replaying() panic. > > As for the other corruption, unfortunately I don't know the details, > but my understanding is that it is happening under higher load. Not > sure I'd trust a kernel built on a machine with this bug present. What > I would do is to compile the kernel as of 068913e4ba somewhere else, > boot the problematic machine in single-user mode and install the newly > built kernel. > > As far as I can tell, contrary to some initial reports, none of the > problems introduced by the recent OpenZFS merge corrupt the pool > metadata, only file's data. You can locate the files modified with the > bogus kernel using find(1) with a proper modification time, but you > have to decide what to do with them (either throw them away, restore > them from backup or inspect them). Sharing my experience on how to get out of the worst case scenario with a building machine that is affected by the bug. CAVEAT: this is my experience, take it at your own risk. It worked for me, there is no guarantee that it will work for your. You may create corrupted files and make your system harder to recover or definitely brick it. Don't blame me, you have been warned. YMMV. Boot in single user mode and check if your pool has block cloning in use: # zpool get feature@block_cloning zroot NAME PROPERTY VALUE SOURCE zroot feature@block_cloning active local In this case it does because the value is "active". If it's "enabled" you do not need to do anything. 1) When in single user mode set compression property to "off" on any zfs active dataset that has compression other than "off" and the sync property to something other than "disabled". 2) Boot multiuser and update your current sources, e.g. git update --rebase 3) Build and install a new kernel without too much pressure (e.g. with -j 1): make -j 1 kernel 4) Reboot with the new kernel 5) Now you have to reinstall the kernel with make installkernel This is because the new kernel files were written by the old kernel and need to be removed. 6) Find out when the pool was upgraded (I used command history) and create a file with that date, in my case: touch -t 2304161957 /tmp/from 7) Find out when you booted the new kernel (I used fgrep Copyright /var/log/messages | tail -n 1) and create a file with that date, in my case: touch -t 2304172142 /tmp/to 8) Find the files/firs created between the two dates: find / -newerBm /tmp/from -and -not -newerBm /tmp/to > /tmp/filelist.txt 9) Inspect /tmp/filelist.txt and save any important items. If the important files are not corrupted you can do: cp important_file new; mv new important_file NOTA BENE: "touch important_file" would not work, you do need to re-create the file. 10) Delete the remaining files/dirs in /tmp/filelist.txt. If you did 5) you will remove /boot/kernel.old files, but not /boot/kernel files. 11) Restore your compression and sync properties where appropiate. BR, -- José Pérez