SU+J systems do not fsck themselves
Scott Long
scottl at samsco.org
Wed Dec 28 07:14:18 UTC 2011
On Dec 27, 2011, at 10:14 PM, David Thiel wrote:
> On Tue, Dec 27, 2011 at 02:48:22PM -0800, Xin Li wrote:
>>>> - use journalled fsck; - use normal fsck to check if the
>>>> journalled fsck did the right thing.
>
> Ok, here is the log of fsck with and without journal.
>
> http://redundancy.redundancy.org/fscklog3
>
The first run of fsck, using the journal, gives results that I would expect. The second run seems to imply that the fixes made on the first run didn't actually get written to disk. This is definitely an oddity. I see that you're using geli, maybe there's some strange side-effect there. No idea. Report as a bug, this is definitely undesired behavior.
> That was done the very next boot, after a clean shutdown. The errors
> from the previous live fsck aren't there (oddly), but there are still
> are apparently some corrections made. The next fsck still complains, but
> doesn't give any salvage prompts.
>
> Here is jsa@'s, done on a live FS with SU+J:
>
> http://redundancy.redundancy.org/fscklog4
>
For the love that is all good and holy, don't ever run fsck on a live filesystem. It's going to report these kinds of problems! It's normal; filesystem metadata updates stay cached in memory, and fsck bypasses that cache. Also, what you see in your log is a file that has been unlinked but held open. This is a common Unix idiom, and one that gets cleaned up by fsck on reboot, whether through the SUJ intent log processing or through a traditional fsck.
> I'm not actually looking to solve my particular problem per se. The
> issue is that almost everyone I've checked with that's running SU+J gets
> unref'd file and other errors when they check their filesystem (with the
> fs live). Unless I'm missing something, a running FS should never have
> those kinds of errors unless you deliberately disabled fsck.
>
Nope, you are completely incorrect here.
> This leaves only a couple options:
>
> - SU+J and fsck do not work correctly together to fix corruption on
> boot, i.e. bgfsck isn't getting run when it should
The point of SUJ is to eliminate the need for bgfsck. Effectively, they are exclusive ideas. It's possible that there are still problems with SUJ and how fsck processes and commits the journal entires. However, bgfsck has nothing to do with this, and I'd also like to know if your use of geli is complicating the problem.
> - Stuff is getting completely screwed up after boot
Possibly but unlikely
> - fsck is giving incorrect results
Very unlikely
> - I'm completely clueless about how SU+J is supposed to behave or be
> deployed
No comment =-)
>
> I'm pretty certain that the first is the issue here. It would be great
> if others could check their own SU+J filesystems so we could get a few
> more data points.
>
Indeed, more data is needed.
Scott
More information about the freebsd-current
mailing list