DFLTPHYS vs MAXPHYS
Matthew Dillon
dillon at apollo.backplane.com
Tue Jul 7 19:02:14 UTC 2009
:All I wanted to say, is that it is FS privilege to decide how much data
:it needs. But when it really needs a lot of data, they should be better
:transferred with smaller number of bigger transactions, without strict
:MAXPHYS limitation.
:
:--
:Alexander Motin
We are in agreement. That's essentially what I mean by all my
cluster_read() comments. What matters the most is how much read-ahead
the cluster code does, and how well matched the read-ahead is on
reducing future transactions, and not so much on anything else (such as
cpu caches).
The cluster heuristics are pretty good but they do break down under
certain circumstances. For example, for UFS they break down when there
is file data adjacency between different inodes. That is often why one
sees the KB/t sizes go down (and the TPS rate go up) when tar'ing up a
large number of small files. taring up /usr/src is a good example of
this. KB/t can drop all the way down to 8K and performance is noticably
degraded.
The cluster heuristic also tends to break down on the initial read() from
a newly constituted vnode, because it has no prior history to work with
and so does not immediately issue a read-ahead even though the I/O may
end up being linear.
--
For command latency issues Julian pointed out a very interesting contrast
between a HD and a (SATA) SSD. With no seek times to speak of command
overhead becomes a bigger deal when trying to maximize the peformance
of a SSD. I would guess that larger DMA transactions (from the point of
view of the host cpu anyhow) would be more highly desired once we start
hitting bandwidth ceilings of 300 MBytes/sec for SATA II and
600 MBytes/sec beyond that.
If in my example the bandwidth ceiling for a HD capable of doing 60MB/s
is hit at the 8K mark then presumably the block size needed to hit the
bandwidth ceiling for a HD or SSD capable of 200MB/s, or 300MB/s, or
higher, will also have to be larger. 16K, 32K, etc. This is fast
approaching the 64K mark people are arguing about.
In anycase, the main reason I posted is to try to correct people's
assumptions on the importance of various parameters, particularly the
irrelevancy of cpu caches in the bigger picture.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the freebsd-arch
mailing list