Possible ZFS bug? Insufficient sanity checks
Borja Marcos
borjam at sarenet.es
Wed Feb 19 14:47:47 UTC 2014
Hello,
Doing something stupid I managed to corrupt a ZFS pool. I think it shouldn´t have been possible. I hope to reproduce it next week, but
it's better to share just in case.
I know what I did was quite foolish, and no dolphins were hurt as it's just a test machine.
FreeBSD pruebassd 10.0-STABLE FreeBSD 10.0-STABLE #8: Wed Feb 12 09:32:29 UTC 2014 root at pruebassd:/usr/obj/usr/src/sys/PRUEBASSD2_10 amd64
The pool has one RAIDZ vdev, with 6 OCZ Vertex 4 SSDs.
The stupid manoeuvre was as follows:
1) Pick up one of the disks at random.
2) Extract it.
So far so good. zpool warns that the pool is in degraded state, but everythng works.
3) Take the disk to a different system. Insert it and create a new pool on it. Just one disk, I was testing a data corruption issue with a "mfi" adapter.
4) Do some tests.
5) Probably (not sure) destroy the newly created pool.
6) take the ssd to the original machine -> insert it
And here the fun comes.
7) zpool online cashopul (the previously removed disk)
8) KABOOM! zpool warns of data corruption all over the place. -> most files corrupted.
My theory: When doing the "zpool online" ZFS just checked the disk serial number or identification, and, being the same, *not verifying the pool identity* it mixed it into the pool with disastrous consequences.
What I think should have happened instead:
- ZFS should verify the physical disk "identity" *and* verify that the ZFS metadata on the disk indeed belongs to the pool on which it's being "onlined".
Again, I do know that I did something very foolish (I behave in a foolish and careless way with that machine on purpose).
I'll try to reproduce this next week (I'm waiting to receive some SAS cables).
Cheers,
Borja.
More information about the freebsd-fs
mailing list