[RFC] vfs.read_min proposal
Konstantin Belousov
kostikbel at gmail.com
Thu Mar 28 07:52:14 UTC 2013
On Wed, Mar 27, 2013 at 01:43:32PM -0700, Maksim Yevmenkin wrote:
> Hello,
>
> i would like to get some reviews, opinions and/or comments on the patch below.
>
> a little bit background, as far as i understand, cluster_read() can
> initiate two disk i/o's: one for exact amount of data being requested
> (rounded up to a filesystem block size) and another for a configurable
> read ahead. read ahead data are always extra and do not super set data
> being requested. also, read ahead can be controlled via f_seqcount (on
> per descriptor basis) and/or vfs.read_max (global knob).
>
> in some cases and/or on some work loads it can be beneficial to bundle
> original data and read ahead data in one i/o request. in other words,
> read more than caller has requested, but only perform one larger i/o,
> i.e. super set data being requested and read ahead.
The totread argument to the cluster_read() is supplied by the filesystem
to indicate how many data in the current request is specified. Always
overriding this information means two things:
- you fill the buffer and page cache with potentially unused data.
For some situations, like partial reads, it would be really bad.
- you increase the latency by forcing the reader to wait for the whole
cluster which was not asked for.
So it looks as very single- and special-purpose hack. Besides, the
global knob is obscure and probably would not have any use except your
special situation. Would a file flag be acceptable for you ?
What is the difference in the numbers you see, and what numbers ?
Is it targeted for read(2) optimizations, or are you also concerned
with the read-ahead done at the fault time ?
>
> ===
>
> Index: trunk/cache/src/sys/kern/vfs_cluster.c
> ===================================================================
> diff -u -N -r515 -r1888
> --- trunk/cache/src/sys/kern/vfs_cluster.c (.../vfs_cluster.c) (revision 515)
> +++ trunk/cache/src/sys/kern/vfs_cluster.c (.../vfs_cluster.c) (revision 1888)
> @@ -75,6 +75,10 @@
> SYSCTL_INT(_vfs, OID_AUTO, read_max, CTLFLAG_RW, &read_max, 0,
> "Cluster read-ahead max block count");
>
> +static int read_min = 1;
> +SYSCTL_INT(_vfs, OID_AUTO, read_min, CTLFLAG_RW, &read_min, 0,
> + "Cluster read min block count");
> +
> /* Page expended to mark partially backed buffers */
> extern vm_page_t bogus_page;
>
> @@ -169,13 +173,21 @@
> } else {
> off_t firstread = bp->b_offset;
> int nblks;
> + long minread;
>
> KASSERT(bp->b_offset != NOOFFSET,
> ("cluster_read: no buffer offset"));
>
> ncontig = 0;
>
> /*
> + * Adjust totread if needed
> + */
> + minread = read_min * size;
> + if (minread > totread)
> + totread = minread;
> +
> + /*
> * Compute the total number of blocks that we should read
> * synchronously.
> */
>
> ===
>
> thanks,
> max
> _______________________________________________
> freebsd-current at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-current/attachments/20130328/ffcc28d7/attachment.sig>
More information about the freebsd-current
mailing list