Google SoC idea
Scott Long
scottl at samsco.org
Wed Jun 8 14:16:51 GMT 2005
Eric Anderson wrote:
> Scott Long wrote:
>
>> Richard Coleman wrote:
>>
>>> Scott Long wrote:
>>>
>>>> /me jumps up and down and waves his hands
>>>>
>>>> The problem with journalling at the block layer is that you pretty
>>>> much become forced to journal metadata and data, since the block
>>>> layer really doesn't know the distinction, and definitely not in a
>>>> filesystem-independent way (yes, UFS does evil things to the buffer
>>>> cache by representing metadata with negative block numbers, but that
>>>> is just UFS). Full journalling has many drawbacks from the
>>>> viewpoint of speed and complexity, of course. So you really want to
>>>> be able to do just metadata journalling.
>>>>
>>>> Another hard part of distinguishing between metadata and data is
>>>> that filesystems have a habit of migrating disk blocks from holding
>>>> metadata to holding data, and vice versa (think indirect pointer
>>>> blocks, not inode blocks). If you are only replaying metadata, you
>>>> want to make sure that you don't smash data blocks with old metadata.
>>>>
>>>> Coming up with a filesystem independent way to represent all of this
>>>> for the block layer is not easy. Filesystems would have to be able
>>>> to be modified to provide proper metadata vs. data hints to the
>>>> block layer. And if you're going to do that, then why not just make
>>>> it a library in VFS, like what Darwin does?
>>>>
>>>> The UFS Journalling work is already well underway, and I expect it
>>>> to follow the path of being a VFS library. Note that I'm saying
>>>> 'library' here, not 'layer'. There really is no way to make
>>>> journalling work with an arbitrary filesystem 'for free', whether as
>>>> a VFS layer or a GEOM transform, since journalling is 100% dependent
>>>> on the filesystem working with the buffer-cache to do sane
>>>> operations in a defined in order.
>>>>
>>>> An alternate SoC project that would be very useful is block-level
>>>> snapshots. I'm not sure if I'll be able to retain the filesystem
>>>> snapshot functionality in UFS with journalling enabled, so moving to
>>>> doing the snapshots in the block layer would be a good way to make
>>>> up for this. Beware that while the GEOM transform would be pretty
>>>> straight-forward to write, the real trick comes from being able to
>>>> make the consumer of a block device (a filesystem, maybe) flush
>>>> itself to a consistent state while the snapshot is being taken. The
>>>> infrastructure for this is the part that is very interesting, but
>>>> also the most work.
>>>>
>>>> Scott
>>>
>>>
>>>
>>>
>>> Scott,
>>>
>>> Have you looked at the journaling layer that Matt has been adding to
>>> DragonflyBSD? What you are talking about appears very similar. Or
>>> am I misunderstanding something?
>>>
>>> Richard Coleman
>>> rcoleman at criticalmagic.com
>>
>>
>>
>> Ah, you might have misunderstood my use of the term 'VFS library'. This
>> is distinctly different from a 'VFS layer', which is what Matt did.
>> I've looked extensively at his work, but unfortunately it doesn't solve
>> the kinds of problems that I'm looking to solve. After discussing
>> journalling this evening with the author of BeFS and HFS+J, I'm pretty
>> happy that I'm taking the approach that I am.
>
>
> Maybe a good SoC project (but maybe too much work) would be getting the
> clustering UFS stuff going.. :)
>
> Eric
>
>
>
THat is more along the lines of a good master's of PhD topic.
Scott
More information about the freebsd-hackers
mailing list