skipping fsck with soft-updates enabled
Eric Anderson
anderson at centtech.com
Thu Jan 11 16:15:52 UTC 2007
On 01/10/07 10:18, Scott Oertel wrote:
> Eric Anderson wrote:
>> On 01/10/07 00:20, Scott Oertel wrote:
>>> Victor Loureiro Lima wrote:
>>>> From rc.conf man page:
>>>> ---
>>>> background_fsck_delay
>>>> (int) The amount of time in seconds to sleep before
>>>> starting
>>>> a background fsck(8). It defaults to sixty seconds
>>>> to allow
>>>> large applications such as the X server to start
>>>> before disk
>>>> I/O bandwidth is monopolized by fsck(8).
>>>> ---
>>>>
>>>> You can set the delay as long as you want, so it wont have to start
>>>> right away, in fact it can start as late as a year (if thats really
>>>> what you want ;))
>>>>
>>>> att,
>>>> victor loureiro lima
>>>>
>>>> 2007/1/10, Oliver Fromme <olli at lurza.secnetix.de>:
>>>>> Scott Oertel wrote:
>>>>> > I am wondering what kind of problems would occur, besides lost
>>>>> space, if
>>>>> > after a system crash a fsck is skipped. According to the
>>>>> documentation,
>>>>> > with soft-updates enabled, the file system would be consistant,
>>>>> there
>>>>> > would just be lost resources to be recovered which I am assuming
>>>>> can be
>>>>> > safely done at a later time to avoid long periods of downtime
>>>>> during
>>>>> > peek hours.
>>>>>
>>>>> I think that's exactly what the background fsck feature
>>>>> does. If you enable it (which is even the default), the
>>>>> fsck process doesn' start right away, so the system comes
>>>>> up in multi-user mode immediately. Then a snapshot is
>>>>> created on the file system, and fsck runs on the snap-
>>>>> shot, freeing the lost space in the file system.
>>>>>
>>>>> Of course, it only works reliably with soft-updates enabled,
>>>>> _and_ there must not be any unexpected inconsistencies.
>>>>> However, with some common setups (e.g. cheap disks lying
>>>>> about completed write operation) it is difficult to
>>>>> guarantee the consistency. Soft-updates is rather fragile
>>>>> when the hardware doesn't work exactly as it's supposed to.
>>>>> I've witnessed breakage in the past, and for that reason
>>>>> I always disable the background fsck feature. And it's the
>>>>> reason I'm looking forward to gjournal to become stable,
>>>>> because it seems to be less fragile in the presence of
>>>>> imperfect hardware.
>>>>>
>>>>> Best regards
>>>>> Oliver
>>>>>
>>>>> --
>>>>> Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
>>>>> Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
>>>>> Any opinions expressed in this message may be personal to the author
>>>>> and may not necessarily reflect the opinions of secnetix in any way.
>>>>>
>>>>> "C++ is to C as Lung Cancer is to Lung."
>>>>> -- Thomas Funke
>>>>> _______________________________________________
>>> The problem with background fsck is that on my machines, it doesn't
>>> work well. These machines have 8x750gb SATA drives and they are under
>>> extreme stress all the time. When you run fsck in the background each
>>> drive takes 10+ minutes to create the snapshot file, during which
>>> time the machine is completely unresponsive, and unstable.
>> What version of FreeBSD are you running? You might try gjournal,
>> which I've had great luck with, and Pawel (pjd@) is incredibly
>> responsive to bug reports, etc.
>>
>>> That is why I am wondering, if it is ok to skip the background
>>> fsck's, foreground fsck's and reschedule them for a later time,
>>> during non peak hours.
>> I think most people would be nervous to tell you 'sure, skip it until
>> later', but I can tell you from experience that I myself have delayed
>> fscking for weeks on end, to do exactly what you want.
>>
>> Eric
>>
>>
>>
> I'm running on 6.2-RC2. For fun I tried to create a snapshot on one of
> our newest machines, same drive config as the previous ones, it's just
> less active then the others. It's running 6.2RC2 and it just completely
> locked up. Anyway, thanks for the suggestion about running gjournal, i'm
> not sure running non-offical patches on the file system code with
> production machines is such a great idea. Have you had any problems with
> gjournal, if so, of what nature were they?
>
Honestly, I haven't had many issues with snapshots since 6.1-ish and
before. There were lots of deadlocks, livelocks, etc. I think Kris@
has done a bang up job at finding bugs and getting them fixed. If you
still see snapshot issues like this, it would be great if you could
start sending some info like a ps -auxl, and if it's a deadlock, drop to
the debugger and get a crash dump.
As far as gjournal, I now have it running on several systems, all very
high usage NFS servers (~1000 high end machines pounding them very hard,
24x7). I've only seen a few little issues on one of my systems that is
running an older 6-STABLE (it's a little difficult for me to update it
right now), but all my other systems have been very solid. PJD has done
a great job getting it stable and ready for production use. As far as I
have experienced, I have had no data loss, and no file system corruption
using it. The worst that's happened is a livelock, followed by a
reboot. Since it is indeed journaled, the reboot takes a few minutes,
and the fsck takes a few *seconds* (on a 10TB volume). I would say,
that using gjournal is more reliable over time, than relying on
background fsck's. Gjournal is, however, still in a beta test mode,
however you should do your own testing to evaluate it. You can always
disable it very easily, without losing your data.
Eric
--
------------------------------------------------------------------------
Eric Anderson Sr. Systems Administrator Centaur Technology
An undefined problem has an infinite number of solutions.
------------------------------------------------------------------------
More information about the freebsd-fs
mailing list