Updated gjournal patches [20061024].
Ivan Voras
ivoras at fer.hr
Fri Oct 27 14:46:15 UTC 2006
Fluffles wrote:
> Please look at the screenshot i made of the panic message:
> http://dev.fluffles.net/images/gjournal-panic1.png
Hmm, quick grepping of the kernel sources for "Enough\." and sources for
gjournal and graid5 for other strings in the panic screenshot doesn't
locate a possible point of failure. You'll probably need to at least
compile DDB & KDB into the kernel so when the panic happens you can
create a backtrace (with "bt" command) and post that information.
> Also i have a question about it's performance. You mentioned earlier
> that writing big files takes about twice as long with gjournal, i wonder
> if this is inherit to journaling itself or due to the current
> implementation. Windows' journaling NTFS, for example, isn't slower than
> FAT32 with big files if i remember correctly. What major differences in
> the journaling process causes this?
Maybe MS is doing its tricks again? The way any journaling works is
this: data is not written where it's supposed to be (which is all over
the disk because files and metadata are scattered across the disk), but
its written in a special on-disk area which is designated "the journal"
and which is sequential. After some time (e.g. when the I/O load
decreases) the data is read from the journal and written to where it
belongs. Thus burst writes to file system are very fast, and the slow
operations (data "relocation" to where it belongs) is performed when
system is under less load.
This journal area is finite in size, so when it gets full, no more
writes can happen until at least part of it is "freed" by relocating the
data where it belongs, which is an operation that requires sequential
reading from journal area and scattered writing to the on-disk data area.
When large files are written to the file system "in bulk", the FS is
already smart enough to store them as sequentially as possible, but
there are two problems with it:
- the FS can't reliably detect if the file that's going to be written is
sequential or not, and neither can the journal driver (so the FS
actually does best effort for any file in the hope that if the file
grows large enough it won't get fragmented)
- all FS operations go through gjournal, so the sequence of operations
becomes: 1. data is written to the journal, 2. the journal gets full and
data is read again from the journal and written to where it belongs, and
during this new writes to the journal are at best very slow. Thus the
slowdown.
> Also, in your earlier post you explained the advantages of a journal
> with regard to significantly reduces fsck times at boot. But my major
> concern is dataloss: on my testserver i've had many kernel
> panics/freezes due to the experimental graid5 module being tested by
> Arne. This has resulted in the system not being able to boot because the
> ad0s2a (read: a!) partition has lost files. And it won't be the first
> time a lockup or power failure caused dataloss on my systems. That's why
> i want to use gjournal: to protect from dataloss. Am i correct in my
> assumption that gjournal addresses my needs in this regard?
To guarantee (meta)data safety on the file system, the FS code must be
sure the data it has placed on the disk will stay there. Soft updates
work so data is written in order and in batches, and the code assumes
each such "batch" arrives safely on the hardware. SU's performance comes
from delaying writes with the intention of not rewriting the same data
multiple times (consider deleting a huge number of files from a
directory: the same directory entry SHOULD be updated after each delete,
but SU delays it so only the final version of the directory, with
removed files, is written). Journaling on the other hand works by
allowing each such write to proceed to the disk, BUT instead of seeking
every time to accommodate where the data is placed, it writes all data
sequentially on the journal, which is much faster (60+ MB/s with today's
hardware). Smart journal engines (don't know if gjournal has this
feature) will only relocate the last modified data entry (e.g. the last
"state" of the directory entry from the example with SU) to its place.
Because all the intermediate data from the file system is placed in the
journal, a power drop in the middle of updating a directory entry will
result in either the directory entry being safely written to the journal
(from where it can be recovered), or the changes being completely lost
(in which case the old, un-updated directory entry is still valid). In
all this, gjournal should do what you need.
The biggest problem today is not the software, but the hardware. Most
disk drives (especially desktop-class ones) lie about safely writing the
data when it's still in their buffers. This is why the modern approach
to building critical data storage is to force the drives not to cache
anything, and employ a hardware RAID controller with huge buffers and a
battery that keeps the buffers "alive" when the power goes down.
More information about the freebsd-geom
mailing list