DFLTPHYS vs MAXPHYS
Matthew Dillon
dillon at apollo.backplane.com
Mon Jul 6 18:12:48 UTC 2009
Linear dd
tty da0 cpu
tin tout KB/t tps MB/s us ni sy in id
0 11 0.50 17511 8.55 0 0 15 0 85 bs=512
0 11 1.00 16108 15.73 0 0 12 0 87 bs=1024
0 11 2.00 14758 28.82 0 0 11 0 89 bs=2048
0 11 4.00 12195 47.64 0 0 7 0 93 bs=4096
0 11 8.00 8026 62.70 0 0 5 0 95 bs=8192 << MB/s breakpt
0 11 16.00 4018 62.78 0 0 4 0 96 bs=16384
0 11 32.00 2025 63.28 0 0 2 0 98 bs=32768 << id breakpt
0 11 64.00 1004 62.75 0 0 1 0 99 bs=65536
0 11 128.00 506 63.25 0 0 1 0 99 bs=131072
Random seek/read
tty da0 cpu
tin tout KB/t tps MB/s us ni sy in id
0 11 0.50 189 0.09 0 0 0 0 100 bs=512
0 11 1.00 184 0.18 0 0 0 0 100 bs=1024
0 11 2.00 177 0.35 0 0 0 0 100 bs=2048
0 11 4.00 175 0.68 0 0 0 0 100 bs=4096
0 11 8.00 172 1.34 0 0 0 0 100 bs=8192
0 11 16.00 166 2.59 0 0 0 0 100 bs=16384
0 11 32.00 159 4.97 0 0 1 0 99 bs=32768
0 11 64.00 142 8.87 0 0 0 0 100 bs=65536
0 11 128.00 117 14.62 0 0 0 0 100 bs=131072
^^^ ^^^
note TPS rate and MB/s
Which is the more important tuning variable? Efficiency of linear
reads or saving re-seeks by buffering more data? If you didn't choose
saving re-seeks you lose.
To go from 16K to 32K requires saving 5% of future re-seeks to break-even.
To go from 32K to 64K requires saving 11% of future re-seeks.
To go from 64K to 128K requires saving 18% of future re-seeks.
(at least with this particular disk)
At the point where the block size exceeds 32768 if you aren't saving
re-seeks with locality of reference from the additional cached data,
you lose. If you are saving reseeks you win. cpu caches do not enter
into the equation at all.
For most filesystems the re-seeks being saved depend on the access
pattern. For example, if you are doing a ls -lR or a find the re-seek
pattern will be related to inode and directory lookups. The number of
inodes which fit in a cluster_read(), assuming reasonable locality of
reference, will wind up determining the performance.
However, as the buffer size grows the total number of bytes you are
able to cache becomes the dominant factor in calculating the re-seek
efficiency. I don't have a graph for that but, ultimately, it means
that reading very large blocks (i.e. 1MB) with a non-linear access
pattern is bad because most of the additional data cached will never
be used before the memory winds up being re-used to cache some other
cluster.
Another thing to note here is that command transfer overhead also becomes
mostly irrelevant once you hit 32K, even if you have a lot of discrete
disks. I/O's of less then 8KB are clearly wasteful of resources (in my
test even a linear transfer couldn't achieve the bandwidth ceiling of the
device). I/O's greater then 32K are clearly dependant on saving re-seeks.
Note in particular that the data transfer rate for random I/O doubles as
the buffer size doubles when you have a random access pattern (because seek
times are so long). In otherwords, it's a huge win if you are actually
able to save future re-seeks by caching the additional data.
What this all means is that cpu caches are basically irrelevant when it
comes to hard drive I/O. You are either saving enough re-seeks to make up
for the greater seek latency or you aren't. One re-seek is something
like 7ms. 7ms is a LONG time, which is why the cpu caches are irrelevant
for choosing the block size. One can bean-count cache misses all day long
but it won't make the machine perform any better in this case.
-Matt
More information about the freebsd-arch
mailing list