Extended Attributes and how to avoid them (was Re: O_XATTR support in FreeBSD?)

Sun Dec 1 23:19:54 UTC 2013

On Dec 1, 2013, at 2:05 PM, Lionel Cons <lionelcons1972 at gmail.com> wrote:

> But this discussion is *not* about extended attributes, this
> discussion is about Alternate Data Streams. Unfortunately the O_XATTR
> discussion somehow started to cover the Linux "extended attribute
> system", which is utterly useless in the intended use cases (as said,
> no access through normal POSIX read(), write(), mmap(), no unlimited
> size, no sparse data support (aka SEEK_HOLE, SEEK_DATA) etc etc).

I think this discussion doesn't really *know* what it's about, frankly, because there are so many possible avenues to choose from! :-)

As we saw earlier, there is apparently some interest in supporting ADS for Windows clients, though the question of how to actually add that support seems primarily the job of Samba (or whatever BSD-licensed equivalent someday emerges), so there's really not much to discuss there from FreeBSD's perspective since FreeBSD itself has little to say on the subject.  If native CIFS support ever becomes a possibility, I'm sure it will come up again!

Then there's the whole topic of EAs (and I don't know who said Linux EAs represented some sort of gold standard - I certainly didn't) and what the intended use cases are.   Let's stick with the intended (and citable) use cases, if you please, because a lot of academic debate over the years about "how EAs should work" has been, to be perfectly honest, ultimately *pointless*.   Academically speaking, there's nothing you can do with an EA that you can't conceptually do just as well, if not better, with a detached attribute database because academics don't have to worry about their EAs working anywhere outside a laboratory setting!

It's the *pragmatic* discussions and clearly defined use cases that carry more weight (if not ALL the weight) - that's where you get into real-world concerns about EAs and how to avoid them and their associated files parting company, how to serialize and back them up, what clients are *actually* going to use them and what APIs they need, etc. etc.

Since you brought up POSIX APIs, let's talk about that for a second.  I've worked with EAs "in the field", as it were, a lot (a LOT) and no one during my long history with them has ever demanded the ability to call read() or write() on an EA, to mmap() one, or to store sparse data in one.  I would love to know which apps actually need to do that (and why), because other than "unlimited size", none of those demands have ever hit any bug database I've had access to.   I'm also generally not one to throw marketing numbers around in a technical conversation, but with 72 million seats and over 1 million applications (and by all means fact-check those numbers), if the ability to use EAs in that fashion were truly necessary, I suspect I would have heard that early and often.   If anything, the trend has been in the other direction - people want a simple file property getting/setting API that maybe uses EAs under the covers or maybe it doesn't, all they know is that they can hand the API a file handle (or path) and a dictionary and The Right Thing happens for storing the EAs, the converse also being true for getting them.   EAs just are not first-class filesystem citizens and, frankly, they don't really need to be in order to be "useful enough" for those situations where an application or bit of OS middleware really needs a way of storing some extended metadata for a file in a filesystem-neutral fashion (and we've already covered the network filesystem and archiver scenarios which make that important).

I'll opine that If FreeBSD really wants to support EAs in a "useful enough" way, then the best way of doing so is to stay focused on the pragmatic "this our usage cases, and we are not afraid to describe them in detail!" side of the street because, as I said, the academic discussions generally don't lead anywhere but in circles.   A pragmatic approach will, conversely, lead to doing just the basic minimums and not waste time implementing anything that won't actually be needed in real-world scenarios.

Heck, if we really want to get all academic about something here, let's forget about EAs and ADS as comparatively uninteresting technologies from the 90's and start talking instead about file object stores that are far more flexible than what we have now!

I don't want to have my filesystem view be necessarily hierarchical (that should be a policy decision, not intrinsic to the filesystem itself).  I don't want any process to necessarily be able to see any part of the file object space save that which I explicitly grant it or its children.  I don't want to have to think about where a file object lives - I'd like it to be able to move around (memory, on-disk, "the cloud", etc) purely in response to how "hot" it is without me having to know or care about anything other than the object changing out from under me (which should also be an intrinsic part of the filesystem access APIs).   I want file objects to be able to have arbitrary properties of any type or size, and able to reference other file objects, such that I don't have to keep side-stores around everywhere to facilitate a lot of basic operations (like searching) that should be intrinsic to the object store, or at least handled by a first class OS service with the ability to be co-resident with it so things like indexing are actually *efficient*.

Can we have that discussion instead?  It would be more fun. :-)

- Jordan