ZFS pool corrupted on upgrade of -current (probably sata
renaming)
Richard Todd
rmtodd at ichotolot.servalan.com
Fri Jul 17 06:14:54 UTC 2009
Louis Mamakos <louie at transsys.com> writes:
> On Wed, Jul 15, 2009 at 03:19:30PM -0700, Freddie Cash wrote:
>>
>> Hrm, you might need to do this from single-user mode, without the ZFS
>> filesystems mounted, or the drives in use. Or from a LiveFS CD, if /usr is
>> a ZFS filesystem.
>>
>> On our ZFS hosts, / and /usr are on UFS (gmirror).
>
> I don't understand why you'd expect you could take an existing
> container on a disk, like a FreeBSD slice with some sort of live data
> within it, and just decide you're going to take a way one or more
> blocks at the end to create a new container within it?
Well, technically, I don't think they were recommmending taking the slice
with live data on it and labeling it, but instead detaching that slice from
the mirror, labeling it, and reattaching it, causing zfs to rewrite all
the data to that half of the mirror. It turns out that trying to reattach
a 1-sector-shorter chunk of disk will still usually work.
> If you look at page 7 of the ZFS on-disk format document that was
> recently mentioned, you'll see that ZFS stores 4 copies of it's "Vdev
> label"; two at the front of the physical vdev and two at the end of
> the Vdev, each of them apparently 256kb in length. That's assuming
> that ZFS doens't round down the size of the Vdev to some convienient
> boundary. It is going to get upset that the Vdev just shrunk out from
> under it?
I've been investigating this a bit (testing the glabel procedure on
some mdconfig'ed disks to see that it does indeed work, and reading
the zfs source.) Turns out that ZFS *does* internally round down the
size of each device to the next multiple of sizeof(vdef_label_t), at
this line of vdev.c:
osize = P2ALIGN(osize, (uint64_t)sizeof (vdev_label_t));
vdev_label_t is 256K long. So as long as your partitions are not an
*exact* multiple of 256K, you should be able to freely detach,
label, and reattach them. If they *are* an exact multiple of 256K,
the procedure should fail on the "reattach" step, so you'll know you
won't be able to proceed and would have to un-label the disk chunk
and put things back as before. See below:
Script started on Thu Jul 16 23:03:11 2009
You have mail.
blo-rakane# diskinfo -v /dev/md2s1a /dev/md3s1a
/dev/md2s1a
512 # sectorsize
517996544 # mediasize in bytes (494M)
1011712 # mediasize in sectors
1003 # Cylinders according to firmware.
16 # Heads according to firmware.
63 # Sectors according to firmware.
/dev/md3s1a
512 # sectorsize
517996544 # mediasize in bytes (494M)
1011712 # mediasize in sectors
1003 # Cylinders according to firmware.
16 # Heads according to firmware.
63 # Sectors according to firmware.
blo-rakane# zpool create test mirror md2s1a md3s1a
blo-rakane# zpool status -v test
pool: test
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
test ONLINE 0 0 0
mirror ONLINE 0 0 0
md2s1a ONLINE 0 0 0
md3s1a ONLINE 0 0 0
errors: No known data errors
blo-rakane# zpool detach test md3s1a
blo-rakane# glabel label -v testd3 /dev/md3s1a
Metadata value stored on /dev/md3s1a.
Done.
blo-rakane# zpool attach test md2s1a /dev/label/testd3
cannot attach /dev/label/testd3 to md2s1a: device is too small
blo-rakane# exit
exit
Script done on Thu Jul 16 23:07:13 2009
More information about the freebsd-current
mailing list