Extending di_nlink and its ilk

Kenneth Vestergaard Schmidt kvs at binarysolutions.dk
Mon Jan 3 13:25:46 GMT 2005


Hello.

I've run into a wee problem trying to create a nice backup-machine. We made
it using rsync, hardlinks, and a modified link-by-hash patch for rsync.

link-by-hash creates an md4 checksum of the file's contents. It then stores
the file in /dana/hashes/abcdef/1234567890 and hardlinks it to the correct
place. This way, identical files only get stored once.

At this point, we ran into the problem with di_nlink and related fields
only being 16-bit, since we were creating more than 32765 sub-directories.

I fixed this by only creating 256 directories, each containing a lot of
files. However, we soon ran into yet another problem, that of more than
32767 links to one file - when we link by contents, this limit comes up
real quick.

My initial idea was to patch the file-system to use one of the spare
values at the end of various inode-structs to provide a 32-bit or 64-bit
value to the link count. Of course, some backward-compatible scheme must
be employed were the original di_nlink is read first, but I wanted to
hear if this is a totally hare-brained scheme before I start doing it,
or if it would actually be useful to others?

The only other choice I have is a couple of extremely ugly hacks to rsync,
which I'd rather not do.


-- 
Best Regards

Kenneth Vestergaard Schmidt


More information about the freebsd-fs mailing list