Broken ZFS boot on upgrade

Jonathan Anderson jonathan.anderson at mun.ca
Mon Nov 11 13:05:29 UTC 2019


Good morning,

On 11/11, Peter Jeremy wrote:
> Based on your symptoms, it sounds like you might have a corrupt zpool.cache.
> /boot/zfs/zpool.cache should be rewritten on every boot but I had one system
> where that wasn't occurring and a FreeBSD upgrade (I don't currently recall
> the actual versions) resulted in more thorough validation checks, which
> failed.

The /boot/zfs/zpool.cache file is present, but it looks like the bootloader
isn't able to see it (or anything else in /boot).


> Can you share your actual layout ("gpart show", "zpool status", details of
> the non-partitioned disks, etc) - that might help us identify a problem.

Certainly. The output from `geom disk list` is:

--
Geom name: ada0
Providers:
1. Name: ada0
   Mediasize: 1000204886016 (932G)
   Sectorsize: 512
   Mode: r1w1e3
   descr: ST91000640NS
   lunid: 5000c5007a4e82cc
   ident: 9XG82F3D
   rotationrate: 7200
   fwsectors: 63
   fwheads: 16

Geom name: ada1
Providers:
1. Name: ada1
   Mediasize: 1000204886016 (932G)
   Sectorsize: 512
   Mode: r1w1e3
   descr: ST91000640NS
   lunid: 5000c5007a4edb75
   ident: 9XG82GVR
   rotationrate: 7200
   fwsectors: 63
   fwheads: 16

Geom name: ada2
Providers:
1. Name: ada2
   Mediasize: 1000204886016 (932G)
   Sectorsize: 512
   Mode: r1w1e2
   descr: ST91000640NS
   lunid: 5000c50090aa52d1
   ident: 9XG9SQVM
   rotationrate: 7200
   fwsectors: 63
   fwheads: 16

Geom name: ada3
Providers:
1. Name: ada3
   Mediasize: 1000204886016 (932G)
   Sectorsize: 512
   Mode: r1w1e2
   descr: ST91000640NS
   lunid: 5000c50090aa98d3
   ident: 9XG9SQ7H
   rotationrate: 7200
   fwsectors: 63
   fwheads: 16

Geom name: ada4
Providers:
1. Name: ada4
   Mediasize: 3000592982016 (2.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1
   descr: ST3000LM016-1N217V
   lunid: 5000c5009049c1c3
   ident: W800RZ6A
   rotationrate: 5400
   fwsectors: 63
   fwheads: 16

Geom name: ada5
Providers:
1. Name: ada5
   Mediasize: 3000592982016 (2.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1
   descr: ST3000LM016-1N217V
   lunid: 5000c5008a91fc59
   ident: W800SEP0
   rotationrate: 5400
   fwsectors: 63
   fwheads: 16
--

The output from `geom part list` is:

--
Geom name: ada0
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 1953525134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: ada0p1
   Mediasize: 524288 (512K)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 17408
   Mode: r0w0e0
   efimedia: HD(1,GPT,33b1f1e3-dd2d-11e4-9108-ecf4bbd78d94,0x22,0x400)
   rawuuid: 33b1f1e3-dd2d-11e4-9108-ecf4bbd78d94
   rawtype: 83bd6b9d-7f41-11dc-be0b-001560b84f0f
   label: gptboot0
   length: 524288
   offset: 17408
   type: freebsd-boot
   index: 1
   end: 1057
   start: 34
2. Name: ada0p2
   Mediasize: 1000204327424 (932G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 541696
   Mode: r1w1e2
   efimedia: HD(2,GPT,33da20c1-dd2d-11e4-9108-ecf4bbd78d94,0x422,0x7470696d)
   rawuuid: 33da20c1-dd2d-11e4-9108-ecf4bbd78d94
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: zfs0
   length: 1000204327424
   offset: 541696
   type: freebsd-zfs
   index: 2
   end: 1953525134
   start: 1058
Consumers:
1. Name: ada0
   Mediasize: 1000204886016 (932G)
   Sectorsize: 512
   Mode: r1w1e3

Geom name: diskid/DISK-9XG82GVR
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 1953525134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: diskid/DISK-9XG82GVRp1
   Mediasize: 524288 (512K)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 17408
   Mode: r0w0e0
   efimedia: HD(1,GPT,343639ad-dd2d-11e4-9108-ecf4bbd78d94,0x22,0x400)
   rawuuid: 343639ad-dd2d-11e4-9108-ecf4bbd78d94
   rawtype: 83bd6b9d-7f41-11dc-be0b-001560b84f0f
   label: gptboot1
   length: 524288
   offset: 17408
   type: freebsd-boot
   index: 1
   end: 1057
   start: 34
2. Name: diskid/DISK-9XG82GVRp2
   Mediasize: 1000204327424 (932G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 541696
   Mode: r1w1e1
   efimedia: HD(2,GPT,345be840-dd2d-11e4-9108-ecf4bbd78d94,0x422,0x7470696d)
   rawuuid: 345be840-dd2d-11e4-9108-ecf4bbd78d94
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: zfs1
   length: 1000204327424
   offset: 541696
   type: freebsd-zfs
   index: 2
   end: 1953525134
   start: 1058
Consumers:
1. Name: diskid/DISK-9XG82GVR
   Mediasize: 1000204886016 (932G)
   Sectorsize: 512
   Mode: r1w1e2
--

(omitting `da0`, which is my recovery USB stick)


> If you boot from a rescue image and either delete (or rename) your
> existing zpool.cache or run
> # zpool set cachefile=/mnt/boot/zfs/zpool.cache zroot
> (where the path cachefile path maps to /boot/zfs/zpool.cache at boot), does
> that help?

Ok, I'll add that to the list of things to try when I regain physical access to
the machine tomorrow. :)


> Whole disks are not recommended for anything other than building partitions
> in.  Ideally, you want all the disks to have GPT (or similar) partitions.
> If you don't need anything else on the disk, just create a single partition
> occupying the entire disk[1].

OK, sounds good. Is there a specific reason for this recommendation? I
understand that this advice is possible because FreeBSD handles cacheing
differently from Solaris (which recommended using full disks), but is the GPT
recommendation something about making ZFS partitions more visible to the
bootloader?


> (I'd also recommend having a small root zpool
> that is a single, preferably mirrored, vdev, rather than a large root spread
> over multiple vdevs).

Indeed, if I'd kept my root pool separate from a larger data pool I wouldn't be
experiencing this issue now. When I added the additional vdevs to the root pool
I had been thinking that it would be convenient not to have to size the pools up
front, but it seems that a little bit of such homework up front may have saved
me a lot of grief this weekend!


> That might be possible though it's not clear why it wouldn't have caused a
> problem in the past.

The last time I did a wholesale upgrade of /boot, there were fewer vdevs in the
pool. So, if the issue is either a) the blocks living on a whole-disk vdev or
b) the blocks living on a 3 TiB vdev, it could be this is just the first time
that the contents of /boot have happened to land in a "bad" place.


> Note that the bootloader is performing I/O via the
> BIOS so if the BIOS can't see the disks, the bootloader won't be able to
> read them.

The BIOS can see the disks, but I see that a later email asked about the size of
those disks... that could be the (an?) issue.


> You can have multiple partitions on a disk and put different partitions
> into different vdevs but there's no point in having different partitions
> on the same disk in the same vdev - that will reduce both performance
> and resilience.

Great: I had kind of suspected that.

Thank you,


Jon
-- 
Jonathan Anderson

jonathan at FreeBSD.org


More information about the freebsd-fs mailing list