Grepping though a disk

Mon Mar 4 09:17:00 UTC 2013

On 4 Mar 2013, at 01:36, Polytropon <freebsd at edvax.de> wrote:

> Due to a fsck file system repair I lost the content of a file
> I consider important, but it hasn't been backed up yet. The
> file name is still present, but no blocks are associated
> (file size is zero). I hope the data blocks (which are now
> probably marked "unused") are still intact, so I thought
> I'd search for them because I can remember specific text
> that should have been in that file.
> 
> As I don't need any fancy stuff like a progress bar, I
> decided to write a simple command, and I quickly got
> something up and running which I _assume_ will do what
> I need.
> 
> This is the command I've been running interactively in bash:
> 
>    $ N=0; while true; do echo "${N}"; dd if=/dev/ad6 of=/dev/stdout bs=10240 count=1 skip=${N} 2>/dev/null | grep "<PATTERN>"; if [ $? -eq 0 ]; then break; fi; N=`expr ${N} + 1`; done
> 
> To make it look a bit better and illustrate the simple
> logic behind my idea:
> 
>    N=0
>    while true; do
>        echo "${N}"
>        dd if=/dev/ad6 of=/dev/stdout bs=10240 count=1 skip=${N} \
>            2>/dev/null | grep "<PATTERN>"
>        if [ $? -eq 0 ]; then
>            break
>        fi
>        N=`expr ${N} + 1`
>    done
> 
> Here <PATTERN> refers to the text. It's only a small, but
> very distinctive portion. I'm searching in blocks of 10 kB
> so it's easier to continue in case something has been found.
> I plan to output the resulting "block" (it's not a real disk
> block, I know, it's simply a unit of 10 kB disk space) and
> maybe the previous and next one (in case the file, the _real_
> block containing the data, has been split across more than
> one of those units. I will then clean the "garbage" (maybe
> from other files) because I can easily determine the beginning
> and the end of the file.
> 
> Needless to say, it's a _text_ file.
> 
> I understand that grep operates on text files, but it will
> also happily return 0 if the text to search for will appear
> in a binary file, and possibly return the whole file as a
> search result (in case there are no newlines in it).
> 
> My questions:
> 
> 1. Is this the proper way of stupidly searching a disk?
> 
> 2. Is the block size (bs= parameter to dd) good, or should
>   I use a different value for better performance?
> 
> 3. Is there a program known that already implements the
>   functionality I need in terms of data recovery?
> 
> Results so far:
> 
> The disk in question is a 1 TB SATA disk. The command has
> been running for more than 12 hours now and returned one
> false-positive result, so basically it seems to work, but
> maybe I can do better? I can always continue search by
> adding 1 to ${N}, set it as start value, and re-run the
> command.
> 
> Any suggestion is welcome!
> 
> 

Hey that's actually a pretty creative way of doing things ;)

Just to make sure, you've stopped daemons and all the stuff that could potentially write to the drive and nuke your blocks right ?