format/newfs larger external consumer drives
John-Mark Gurney
jmg at funkthat.com
Wed Jul 15 22:06:22 UTC 2015
Dieter BSD wrote this message on Wed, Jul 15, 2015 at 10:37 -0700:
> [ freebsd-fs@ added ]
>
> >> If the average filesize will be large, use large block/frag sizes.
> >> I use 64 KiB / 8 KiB. And reduce the number of inodes. I reduce
> >> inodes as much as newfs allows and there are still way too many.
> >
> > Can you think of an algorithmic way to express this? I.e., you don't
> > want blocks to get *too* large as you risk greater losses in "partial
> > fragments", etc. Likewise, you don't want to run out of inodes.
>
> I look at df -i for existing filesystems with similar filesizes.
> My data filesystems usually get an entire disk (..., 2TB, 3TB, recently 5TB)
> and with 64/8 block/frag and as few inodes as newfs will allow
> df still reports numbers like 97% full but only using 0% or 1%
> of inodes.
>
> density reduced from 67108864 to 14860288
> /dev/ada1: 4769307.0MB (9767541168 sectors) block size 65536, fragment size 8192
> using 1315 cylinder groups of 3628.00MB, 58048 blks, 256 inodes.
> with soft updates
>
> I should take another look at increasing the size of cylinder groups.
Right now the cg by default is made to fill a block... I don't
believe it can be made larger without a major overhaul of the code...
The default used to be even smaller than a full block causing even
more cg's to be created and you had to do trial and error to figure
out how to make a cg a full block...
> Newfs likes very small cylinder groups, which made sense 30 years when
> disks were like 40 MB and file sizes were a lot smaller. IIRC, each
> cylinder group gets at least one block of inodes, and with file sizes
> of 1-20 GB I get way too many inodes.
This is partly the default number of inodes are too large... The
current documented default is an inode for every 4 * frag_size bytes
of data space, which isn't correct!!! This was changed to 2 in
r228794 to keep the number of inodes the same when the transition
from 16k/2k to 32k/4k happened, but the documentation was not
updated... It has now been updated in r285615 and will be MFC'd...
On my dev server where I have a few source trees checked out:
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/ada0s2d 185G 122G 48G 72% 2.8M 9.5M 23% /a
This fs has non-standard config in that my frag size is 8k... If it was
standard, I'd have twice as many inodes... Increaseing the frag size
both cuts the # of inodes in half, but also increases the cg size...
Standard:
/dev/ada0s2d: 192068.0MB (393355264 sectors) block size 32768, fragment size 4096
using 307 cylinder groups of 626.09MB, 20035 blks, 80256 inodes.
Non-standard:
/dev/ada0s2d: 192068.0MB (393355264 sectors) block size 32768, fragment size 8192
using 166 cylinder groups of 1162.97MB, 37215 blks, 74496 inodes.
The other thing I didn't realize (and would be useful for someone to
benchmark) is that many SSD's now use 8k page size instead of the
previous 4k..
Maybe this needs to be more of a sliding scale based upon disk size?
Maybe go from 2 * frag to 4 * frag at fs's larger than 1TB?
Though this is still something that a system admin needs to address,
it's impossible to make the defaults sane for all use cases... There
are some people that will only keep multi GB files on their 5 TB fs,
and so only need a few thousand inodes, but others may keep more
smaller files...
It'd be nice to put together a fs survey to see what sizes of
filesystems people have, and the distribution of files sizes...
I'll try to do that...
> Yes, a larger frag size will waste some space in the last frag of a file,
> but having smaller block&frag sizes uses a lot of space to keep track of
> all those blocks and frags. And makes more work for fsck.
Yep...
> > "risk" of loss/cost of recovery (when the medium
> > *is* unceremoniously dismounted
>
> Some panics don't sync the disks. Sometimes disks just go into a coma.
> Soft updates is supposed to limit problems to those that fsck -p will
> automagicly fix. (assuming the disk's write cache is turned off) There
> is at least one case where it does not. See PR 166499 (from 2012,
> still not fixed).
>
> As long as I'm whining about unfixed filesystem PRs, see also
> bin/170676: Newfs creates a filesystem that does not pass fsck.
> (also from 2012)
>
> > I am concerned with the fact that users can so easily/carelessly "unplug"
> > a USB device without the proper incantations beforehand. of course, *their*
> > mistake is seen as a "product design flaw"! :-/
>
> Superglue the cable in place? :-)
>
> Perhaps print up something like "Unmount filesystem(s) before unplugging
> or powering off external disk, or you might lose your data.",
> laminate it and attach it to the cables?
Same problem goes for Windows.. They have a policy of turning of
write buffering on pluggable thumb drives to help eliminate this..
For UFS, the sync flag should be provided to mount...
[...]
> Alternately, instead of panicing, could the filesystem just
> umount -f the offending filesystem? (And whine to log(9).)
>
> I am very tired of having an entire machine panic just because
> one disk decided to take a nap. This is not how you get 5 9s. :-(
There has been lots of work to try to make file systems not panic
when the underlying drives disappear, though clearly more work is
needed... Patches welcome! :)
--
John-Mark Gurney Voice: +1 415 225 5579
"All that I will do, has been done, All that I have, has not."
More information about the freebsd-fs
mailing list