stupid UFS behaviour on random writes

Rick Macklem rmacklem at uoguelph.ca
Thu Jan 17 23:01:37 UTC 2013


Wojciech Puchar wrote:
> create 10GB file (on 2GB RAM machine, with some swap used to make sure
> little cache would be available for filesystem.
> 
> dd if=/dev/zero of=file bs=1m count=10k
> 
> block size is 32KB, fragment size 4k
> 
> 
> now test random read access to it (10 threads)
> 
> randomio test 10 0 0 4096
> 
> normal result on such not so fast disk in my laptop.
> 
> 118.5 | 118.5 5.8 82.3 383.2 85.6 | 0.0 inf nan 0.0 nan
> 138.4 | 138.4 3.9 72.2 499.7 76.1 | 0.0 inf nan 0.0 nan
> 142.9 | 142.9 5.4 69.9 297.7 60.9 | 0.0 inf nan 0.0 nan
> 133.9 | 133.9 4.3 74.1 480.1 75.1 | 0.0 inf nan 0.0 nan
> 138.4 | 138.4 5.1 72.1 380.0 71.3 | 0.0 inf nan 0.0 nan
> 145.9 | 145.9 4.7 68.8 419.3 69.6 | 0.0 inf nan 0.0 nan
> 
> 
> systat shows 4kB I/O size. all is fine.
> 
> BUT random 4kB writes
> 
> randomio test 10 1 0 4096
> 
> total | read: latency (ms) | write: latency (ms)
> iops | iops min avg max sdev | iops min avg max
> sdev
> --------+-----------------------------------+----------------------------------
> 38.5 | 0.0 inf nan 0.0 nan | 38.5 9.0 166.5 1156.8 261.5
> 44.0 | 0.0 inf nan 0.0 nan | 44.0 0.1 251.2 2616.7 492.7
> 44.0 | 0.0 inf nan 0.0 nan | 44.0 7.6 178.3 1895.4 330.0
> 45.0 | 0.0 inf nan 0.0 nan | 45.0 0.0 239.8 3457.4 522.3
> 45.5 | 0.0 inf nan 0.0 nan | 45.5 0.1 249.8 5126.7 621.0
> 
> 
> 
> results are horrific. systat shows 32kB I/O, gstat shows half are
> reads
> half are writes.
> 
> Why UFS need to read full block, change one 4kB part and then write
> back, instead of just writing 4kB part?

Because that's the way the buffer cache works. It writes an entire buffer
cache block (unless at the end of file), so it must read the rest of the block into
the buffer, so it doesn't write garbage (the rest of the block) out.

I'd argue that using an I/O size smaller than the file system block size is
simply sub-optimal and that most apps. don't do random I/O of blocks.
OR
If you had an app. that does random I/O of 4K blocks (at 4K byte offsets),
then using a 4K/1K file system would be better.

NFS is the exception, in that it keeps track of a dirty byte range within
a buffer cache block and writes that byte range. (NFS writes are byte granular,
unlike a disk.)
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to
> "freebsd-hackers-unsubscribe at freebsd.org"


More information about the freebsd-hackers mailing list