[rfc] 64-bit inode numbers
Rick Macklem
rmacklem at uoguelph.ca
Fri Jun 24 22:09:05 UTC 2011
Garance A Drosehn wrote:
> On 6/23/11 6:26 PM, Kostik Belousov wrote:
> > On Thu, Jun 23, 2011 at 06:05:56PM -0400, Garance A Drosehn wrote:
> >
> >> Consider the thread "Increasing the size of dev_t and ino_t" from
> >> freebsd-arch in 2002:
> >>
> >> http://docs.freebsd.org/mail/archive/2002/freebsd-arch/20020317.freebsd-arch.html
> >>
> >> In particular, this message by Robert Watson:
> >>
> >> http://docs.freebsd.org/cgi/getmsg.cgi?fetch=139853+0+archive/2002/freebsd-arch/20020317.freebsd-arch
> >>
> >> I just participated in an online conference for OpenAFS, and while
> >> it
> >> isn't exactly taking the world by storm, I keep thinking it would
> >> be
> >> useful if FreeBSD could map individual AFS volumes to unique dev_t
> >> identifiers. And given the way AFS is implemented (as a global FS
> >> with many cells all reachable at the same time), and given the way
> >> most
> >> sites deploy AFS (with thousands or tens-of-thousands of individual
> >> AFS volumes *per site*), that adds up to a lot of values for dev_t.
> >>
> >> The upcoming release of OpenAFS should include a working and pretty
> >> stable AFS client for FreeBSD, so having a larger dev_t would have
> >> a more immediate application than it did back in 2002.
> >>
> > Am I right that the issue is the uniqueness of the dev_t for each
> > AFS volume, as reported by stat(2) ?
> >
> > Shouldn't the AFS client synthesize the dev_t for each new volume
> > mounted ? It seems that the current 32bit dev_t would be enough,
> > since I do not expect to see hundreds of thousands of mounts
> > on an single system.
> >
> > Please note that we do not guarantee dev_t stability across reboots
> > even for real devices.
> >
> The AFS cell at RPI has approximately 40,000 AFS volumes, and each
> volume should have it's own dev_t (IMO). That's just counting the
> collection of AFS volumes which are on RPI file servers, and any
> user sitting on one computer could access AFS volumes which are
> made available by other sites (aka "AFS cells"). Most RPI users
> would only have access to maybe 1/4 of those volumes which exist
> at RPI, but we do know that individual users have run 'find' over
> the entire RPI cell looking for whatever they're looking for. I
> once did a run of 'md5deep' on the entire RPI cell, thanks to a
> symlink which I didn't realize was in my home directory!
>
Note that it the value in mnt_stat.f_fsid that needs to be unique w.r.t
other mount points in the machine. If AFS appears to be one mount
point in the FreeBSD client, then the only issue I know of is how
the client is expected to handle changes in dev_t within the mount
point. fts(3) and friends will assume that it is a mount point
crossing when st_dev changes. It will then expect that the funny
rule that the d_ino in dirent will not be the same as st_ino.
What I do for NFSv4 is sythesize the mnt_stat.f_fsid value and
return that as st_dev for the mounted volume until I see the fsid
returned by the server change. Below that point, I return the fsid
from the server as st_dev so long as it isn't the same as the
synthesized one. That way, fts(3) and friends figure out the mount
point crossings within the server.
"ls -lR" will usually find problems if this is broken.
> So one person can easily trigger the access of 10,000 AFS volumes
> on one computer using one command. That might sound terrifying if
> you imagine it as being 10,000 NFS mounts, but accessing AFS volumes
> isn't the same amount of work as auto-mounting NFS filesystems.
> So ignore whatever problems you might expect to see with 10,000
> filesystems mounted on one computer. Just realize that it is very
> easy for a single user to access tens of thousands of AFS volumes
> from one computer, and it would be "most correct" (programming wise)
> if all of those AFS volumes were to get a unique value for dev_t.
> And of course it's even easier for a remote-access system to access
> tens-of-thousands of AFS volumes, since it would have a few dozen
> users logged in at the same time.
>
> Obviously most computers never access even 30,000 AFS cells before
> they (as the AFS client) will reboot, but I'm wondering how much
> overhead is there in trying to make sure that many different volumes
> are mapped to unique dev_t numbers.
>
> Please realize that I do not mind if people felt that there was no
> need to increase the size of dev_t at this time, and that we should
> wait until we see more of a demand for increasing it. But given the
> project to increase the size of inode numbers, I thought this was a
> good time to also ask about dev_t. I ask about it every few years :-)
>
> --
> Garance Alistair Drosehn = gad at gilead.netel.rpi.edu
> Senior Systems Programmer or gad at freebsd.org
> Rensselaer Polytechnic Institute or drosih at rpi.edu
>
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
More information about the freebsd-fs
mailing list