gvinum raid5 cannot find file system superblock after removing
a disk
Ulf Lilleengen
lulf at stud.ntnu.no
Wed Nov 21 02:09:03 PST 2007
On fre, nov 16, 2007 at 02:01:20 -0700, Edward Sutton wrote:
>
> Ulf Lilleengen wrote:
> >Your description was a bit hard to understand, so please post a more detailed
> >description of what you did, because all I got out from this was:
>
> >1) You have a RAID-5 volume, and ad8 fails.
> >2) You dd'ed the first 64 mb of ad8...
> >3) You tried to repartition ad8... and then what? did you try rebuildparity?
>
> >Could you also please post your gvinum configuration (gvinum printconfig) and
> >listing (gvinum list)?
>
> Hopefully word wrap does not kill the output of the configuration (at the end of this email). On a Promise TX4 SATA150 , I have 3 400GB drives which i partitioned (sliced actually if I recall my BSD titles correctly) in half. Seemed appropriate because I do not know of any utilities that let me manipulate the layout like I can with windows and linux pieces. If I do not like my layout, I can always try again on the other half; space is available to migrate back and forth.
> On the first half, I setup a bootable system; Swap striped across the 3 disks, root mirrored across the 3 disks (yes, 2 copies. I could not think of a better idea for the leftover space on the 3rd disk than make another copy), usr var and tmp are striped with parity across the 3 drives.
> I have had trouble with system crashes. One appeared to be drive related and the best I tracked from more recent crashes appeared to be with swap space; after I use so much, I'd get a crash. I have been using swap space from an unrelated hard drive and have achieved uptimes of>1 month again. Only 2 of the 3 entries for the root mirroring ever appear to work once something goes down and I cannot start it because its in use after booting (which it sounds will go away with future patches)
> I was trying to move away from my vinum play to work a bit more with gmirror (which I setup a server with for my uncle). Instead of moving over to the ad*s2 parts of the disk, I figured I would just use all of one of the three disks. I had a disk that was currently out of sync (ad8), so I figured I would pull that disk out. I made a backup copy of the first 64MB with dd because I figured using fdisk only touches the start so I could put back any of those changes. I proceeded by rebooting from a FreeBSD copy on another drive (which does not load geom_vinum by default but does have it available) and repartitioned the drive to use the entire 400GB together. I did not format that space yet and after finidng I could not reboot my gvinum configured system, I restored the 64MB and found that I still could not boot the gvinum configured system. The root partition was okay (on 2 of the 3 disks as expected), but all raid5 disks errored out with the superblock message for both mounts and fsck.
> I played with editing the on disk configuration which can be gathered by `dd if=/dev/ad4s1h skip=9 count=5` in the case of my configuration. I'd use my favorite text editor to change 'sd' lines to contain the plex and plexoffset and change it from 'down' to 'up' (and reversing that on other lines when disabling disks) and found that I was able to get other responses on partitions and even get to where 'some' data could be read. Changing skip to seek makes it so I can write the new output back to disk. I had a dd of more than a count of 5 that I could put back (in hopes of always being to undo that damage). It was strange that the changes did not always produce predictable results. Changing tmp to where I could partially read it after mounting it (read only) was done 1 way, Different changes were required for the other raid5s and combining those changes lead to more results as to the errors I would get on bootup.
Just FYI, the disk config is exacly 265 sectors, you should copy the whole
config, not just parts of it. Also, within these 265 sectors, gvinum stores
two copies of the configuration, so that if one goes bad, it uses the second.
To force subdisk into up state, you should use 'gvinum setstate -f up <subdisk>'
> I found the needed combination to be: changing all the subdisks to a good state, booting the computer (or loading geom_vinum), mounting the partitions, then unplugging ad8 leads to a disk that I can access as it was before I tampered with the 'down, but plugged in' ad8 in my attempts to consider moving from gvinum. A missing ad8 leads to not being able to mount, and an available and up ad8 leads to much corruption on the file system. Mounting and then unplugging the disk makes it happy though by my testing of filesystem corruption I saw not being present and I can play videos from disk (which can be gigs in size without any sign of any additional corruption).
> Now that I could read the data, I used a `ccdconfig ccd0 64 none /dev/ad4s2 /dev/ad6s2` and a newfs across it to copy all but the swap partition by using a dump/restore. Is it just me, or does everyone else always forget that restore restores to the current directory instead of a directory added as an ignored paramater? `dump -0 -f - /dev/gvinum/usr|buffer -S 2058K -p 75|restore -r -f -` was one used command (and took about 13 hours to complete). By using ccd0 instead of ad8, I was able to keep ad8 in tact in case I needed additional attempts to access/copy the data.
Not being able to mount when the plex is degraded is strange. I have had no
problems with this in gvinum from 6.x FreeBSD.
> Now that I had a copy of the data, I tested a fsck_ffs across /dev/gvinum/tmp and all I ended up with was one lost+found directory which took up 410K when the old one took up 2K, but there was 747MB in the tmp directory, so a lot was missing. I expected bad things to happen, which is why I try to copy/backup what I can read 'before' I try to fix a broken file system.
> I should be able to newfs /dev/gvinum/ partitions and dump/restore to them from my ad*s2 partitions and follow that with a rebuildparity of ad8 to safely get things back to how they were for my previousely semi-stable but booting system right? Before I destroy/overwrite any more data, would any other information be useful to examine/debug? If ad8 had really failed, could I still have gotten back that data?
I will never give such a guarantee, other than it should work. Looking at the
disk configuration that you attached, this should make the plex go up again.
Also, note that rebuilding takes a long time :) However, if the fs you
copied/took backup of is corrupted, you'd still have problems mounting.
If you want to experiment later, patches of gvinum rework can be found here
(only for RELENG_7 and current ATM):
http://people.freebsd.org/~lulf/patches/gvinum/
--
Ulf Lilleengen
More information about the freebsd-geom
mailing list