Please help: trying to determine how to resurrect a ZFS pool
Alan Gerber
unlateral at gmail.com
Wed Oct 3 05:20:17 UTC 2012
All,
Apologies if I've sent this to the wrong list.
I had a kernel panic take down a machine earlier this evening that has
been running a ZFS pool stably since the feature first became
available back in the 7.x days. Today, that system is running 8.3.
I'm hoping for a pointer that will help me recover this pool, or at
least some of the data from it. I'd certainly like to hear something
other than "your pool is hosed!" ;)
Anyway, once the system came back online after the panic, ZFS showed
that it had lost a number of devices:
hss01fs# zpool status
pool: storage
state: UNAVAIL
status: One or more devices could not be opened. There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-3C
scan: none requested
config:
NAME STATE READ WRITE CKSUM
storage UNAVAIL 0 0 0
raidz1-0 ONLINE 0 0 0
ad18 ONLINE 0 0 0
ad14 ONLINE 0 0 0
ad16 ONLINE 0 0 0
raidz1-2 UNAVAIL 0 0 0
13538029832220131655 UNAVAIL 0 0 0 was /dev/da4
7801765878003193608 UNAVAIL 0 0 0 was /dev/da6
8205912151490430094 UNAVAIL 0 0 0 was /dev/da5
raidz1-3 DEGRADED 0 0 0
da0 ONLINE 0 0 0
da1 ONLINE 0 0 0
9503593162443292907 UNAVAIL 0 0 0 was /dev/da2
As you can see, the big problem is the loss of the raidz1-2 vdev. The
catch is that all of the missing devices are in fact present on the
system and fully operational. I've tried moving these devices to
different physical drive slots, reseating the drives, zfs import -F
and everything else I can think of to make the four missing devices
show up again. Inspecting the labels on the various devices shows
what I would expect to see:
hss01fs# zdb -l /dev/da4
--------------------------------------------
LABEL 0
--------------------------------------------
version: 28
name: 'storage'
state: 0
txg: 14350975
pool_guid: 14645280560957485120
hostid: 666199208
hostname: 'hss01fs'
top_guid: 11177432030203081903
guid: 17379190273116326394
hole_array[0]: 1
vdev_children: 4
vdev_tree:
type: 'raidz'
id: 2
guid: 11177432030203081903
nparity: 1
metaslab_array: 4097
metaslab_shift: 32
ashift: 9
asize: 750163329024
is_log: 0
create_txg: 11918593
children[0]:
type: 'disk'
id: 0
guid: 4427378272884026385
path: '/dev/da7'
phys_path: '/dev/da7'
whole_disk: 1
DTL: 4104
create_txg: 11918593
children[1]:
type: 'disk'
id: 1
guid: 17379190273116326394
path: '/dev/da6'
phys_path: '/dev/da6'
whole_disk: 1
DTL: 4107
create_txg: 11918593
children[2]:
type: 'disk'
id: 2
guid: 6091017181957750886
path: '/dev/da3'
phys_path: '/dev/da3'
whole_disk: 1
DTL: 4101
create_txg: 11918593
[labels 1-3 with identical output values snipped]
If I look at one of the operational drives that ZFS recognizes, such
as /dev/ad18, I see the same transaction group value present.
I've done enough digging to realize that at this point the problem is
likely that the GUID entries for each disk are not matching up with
what ZFS is expecting from the disk. But I'm not sure what to do
about it. If one of you fine folks could please point me in the
direction of recovering this pool, I'd greatly appreciate it!
--
Alan Gerber
More information about the freebsd-fs
mailing list