From nobody Sun Dec 19 19:40:43 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id A65AC18FE175 for ; Sun, 19 Dec 2021 19:41:01 +0000 (UTC) (envelope-from leeb@ratnaling.org) Received: from mail-ot1-x330.google.com (mail-ot1-x330.google.com [IPv6:2607:f8b0:4864:20::330]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JHCkj4B4pz4bbK for ; Sun, 19 Dec 2021 19:41:01 +0000 (UTC) (envelope-from leeb@ratnaling.org) Received: by mail-ot1-x330.google.com with SMTP id x43-20020a056830246b00b00570d09d34ebso10092824otr.2 for ; Sun, 19 Dec 2021 11:41:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ratnaling-org.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=O5lfAbs9eOtFctuEUILp6bkCKPcOaxCRnKLSDVuMBCE=; b=ZqRg94P29IKhPlpw0XtxQA3n3qUvL4m1DXyOIFwTWwPVhlfZqtRZ4AVoOv4MAUD6YW /1tY2V4Qh37Lzfgv1cQnuHt5UGeWhd4f1cVU7D+Vq+KkzHOrS9+Oj1CT/HnsnQ6dOWQk teIKNTyHa9flcTnF3SdkBl9T9h/CeKM/St9uKmiQNyXYnYYP9BcEVL1SOKYxlIBNqtVY TWk0PT0jQrqlErDO6lpWMue/grRlYHu+EIZGsdf8HEWKlABXHNzcMOgdfoeBiqR6mU13 vCClFq2l57uYcFk7incD1TPbej/yIuLVrQlx4WPa3/3XhbyZYN08HGoNQQqTaW4hO9+M jqLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=O5lfAbs9eOtFctuEUILp6bkCKPcOaxCRnKLSDVuMBCE=; b=i6M5GGJE+D4xGhuEgf/jyqcFeIb7gNt3EihKH/IYxgmw9a4t+QYgMG4ocMe4gbohfV PqT249OL9DC2hPp2UMIMEK2K1BBGvRbOagyaC+5qLiMXteErkk2+/akGkc700gyjmMOo cGTbFS5NjG6MZeOuKuA+s0XJGGTQ3+grr9buiEF0AjRlPVMJ9zGApEHJRO8Rf4OcTEEa GVKJpMGAQTfQUo9cRCdV4im9a/7Y8X9hlKl834vVBHfj3HKTXzbgSp0Q6Rf6hy/QxHC1 nj3yo2lCG8xaRxsrIgFzhfAqrPcBs8fW5Ii3OTibPVXWRmeJ/HQWMwP1RfZm8drMGcqa 4RcA== X-Gm-Message-State: AOAM533Tv+aF6XrBISab5TS++nphVfIruq0tPGNrNMacA+ZjzuHjwVHz Cx4D7COT7HTIWp6HARMO8qBmQU2bvivPzpX4OeCDHcEs6r5bpg== X-Google-Smtp-Source: ABdhPJyW1PnLwOYWqgbhRg/xUsfcdxSQnHWpm/qWc/aNdAh+rZBhWdWBWrB+HUg3bEu/0mYqJbZF767A5U7EeB0CLEs= X-Received: by 2002:a05:6830:148c:: with SMTP id s12mr9068805otq.105.1639942855109; Sun, 19 Dec 2021 11:40:55 -0800 (PST) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 References: <20211219175011.3023a232@fabiankeil.de> In-Reply-To: <20211219175011.3023a232@fabiankeil.de> From: Lee Brown Date: Sun, 19 Dec 2021 11:40:43 -0800 Message-ID: Subject: Re: Patches for GPT and geli recovery To: Fabian Keil Cc: FreeBSD hackers Content-Type: multipart/alternative; boundary="00000000000054c47505d384f311" X-Rspamd-Queue-Id: 4JHCkj4B4pz4bbK X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N --00000000000054c47505d384f311 Content-Type: text/plain; charset="UTF-8" On Sun, Dec 19, 2021 at 8:52 AM Fabian Keil wrote: > [cut] > BTW, I would also be interested to know if others have > experienced similar data corruption and could figure > out how it happened. > Sounds like bitrot. Bits flip on disks all the time, it doesn't matter if they are spinning rust or SSD, it happens. Sometimes they are detected and corrected, in which case you won't know. Sometimes they are detected and uncorrectable, you'll see that error propagated into the driver. And sometimes they are not detected at all and cause no errors that the OS can surmise. The higher the density of bits, the higher the probability of corruption. SMART is not reliably predictive. How does it happen? Cosmic rays and entropy. I've had lighty written SSD's fail after a few months. I don't use ZFS, but have GELI-Authentication under a GMIRROR, so whenever a bad checksum is read, it breaks the mirror, which gets attention (Iast I looked, there wasn't a simple userland hook for bad GELI reads, but there was for GMIRROR add/remove events). HTH - lee --00000000000054c47505d384f311 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Sun, Dec 19, 2021 at 8:52 AM Fabia= n Keil <freebsd-listen@f= abiankeil.de> wrote:
[cut]
BTW, I would also be interested to know if others have
experienced similar data corruption and could figure
out how it happened.
Sounds like bitrot.=C2=A0 Bits f= lip on disks all the time, it doesn't matter if they are spinning rust = or SSD, it happens.=C2=A0 Sometimes they are detected and corrected, in whi= ch case you won't know.=C2=A0 Sometimes they are detected and uncorrect= able, you'll see that error propagated into the driver.=C2=A0 And somet= imes they are not detected at all and cause no errors that the OS can surmi= se.=C2=A0 The higher the density of bits, the higher the probability of cor= ruption.=C2=A0 SMART is not reliably predictive.=C2=A0 How does it happen?= =C2=A0 Cosmic rays and entropy.=C2=A0 I've had lighty written SSD's= fail after a few months.

I don't use ZFS, but have GELI-Authentication u= nder a GMIRROR, so whenever a bad checksum is read, it breaks the mirror, w= hich gets attention (Iast I looked, there wasn't a simple userland hook= for bad GELI reads, but there was for GMIRROR add/remove events).

HTH - lee<= br>
--00000000000054c47505d384f311--