Re: EFI and zfs raid mirror partial fail (14.0 and RELENG_13)
Date: Sat, 02 Dec 2023 05:57:12 UTC
On Fri, Dec 1, 2023 at 10:34 PM Zaphod Beeblebrox <zbeeble@gmail.com> wrote: > It can be more straightforward to update the gmirror, however. I've done > this with UFS --- old boot, pair of UFS/GMIRROR usb sticks that then boot > to a ZFS array that the BIOS couldn't see (so UFS only contained /boot and > /rescue). It's easier to know that the boot is updated identically if > gmirrored. Gmirror also has tools to verify, etc. > Yes. More straight forward, not as safe. BIOS runs before FreeBSD, and doesn't use gmirror at all, so it can't know if one copy is good or not. IT has to assume that the copies are always good. If you are a single user, then the convenience is likely worth it. It's going ot be fine and if you have a power failure while updating, then you are going to be right there to cope with whatever fallout by choosing the right device to boot from if the primary is corrupted. Once you reboot FreeBSD, the gmirror will resilver (usually) and life will be good. But you have to make absolutely sure that the gmirror never degrades (which happens sometimes on crashes) so that it always will update when you write a new loader. If the mirror is degraded, it will boot the old loader if the degraded side is the primary boot device for the BIOS. If you are deploying a redundant EFI booting system for lots of machines, some of which are in the middle of nowhere without remote hands available, then you can't rely on gmirror to always be right (because it can create corrupted partitions while updating each copy that can pose problems when you lose power. And there's the broken mirror problem that has to be constantly monitored. At work, we cope with this by having lots of monitor scripts for gmirror-based system and then take corrective actions when bad things happen to a gmirror element. But for our multiple, redundant ESPes, we manually update them one at a time because we can't take a chance on the gmirror being broken. If we have a drive that's the primary boot fail read-only and we can't change the BIOS boot order, then we RMA the box (though that's rare: we can usually move the primary and arrange a different drive to be the backup booting device). When you have tens of thousands of machines, even low failure rates can cause big expenses... Though the broken mirror and the BIOS boots the wrong disk that can't be fixed problem is way more common than having gmirror break due to a crash during an upgrade (but the latter does happen). So yea, gmirror is convenient. But you have to watch it like a hawk to make sure the mirror isn't broken before you do the update. And to make sure that you can get hands on the system if an update breaks badly due to a ill-timed power failure or system panic. Warner > On Fri, Dec 1, 2023 at 7:46 PM Warner Losh <imp@bsdimp.com> wrote: > >> >> >> On Fri, Dec 1, 2023, 4:57 PM Pete French <pete@twisted.org.uk> wrote: >> >>> >>> On 01/12/2023 21:53, mike tancsa wrote: >>> > Should have looked at open PRs. There is one from a while ago >>> > >>> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=258987 >>> > >>> > >>> >>> Was thinking about this, and I was wondering if it would be possible to >>> make the EFI partition a gmirror. So its across all discs, mounted only >>> once, but would still boot from any of them. My understanding is geom >>> has the label at the end, yes ? So the firmware would see the filesystem >>> on a single partition quite happily ? >>> >> >> I've done this. It works ok. But I don't run like this in production. If >> I write a new file, that has so many writes to the different disks. If they >> all go through then life is good (this is what gets us to OK). >> >> BUT, if there is a power failure or crash and only some of them make it >> to disk, then you have a corrupt ESP and the BIOS may pick that ESP to boot >> off of, booting corrupt data. >> >> Since this is infrequently updated, you can use a safe sequence to update >> things one partition a time, then you might lose the file entirely, but it >> will either be there and good. Or it will be gone. You can't get into a bad >> situation. Either you boot old or new loader and can just quit from the >> boot loader if it's the old one and it can't boot. Efi will try the next >> one on the list. >> >> Here manual mirroring, if scripted, can be more reliable than gmirror. >> >> Warner >> >> -pete. >>> >>> >>>