fcntl(F_RDAHEAD)
Igor Sysoev
is at rambler-co.ru
Thu Sep 17 10:33:33 UTC 2009
Hi,
nginx-0.8.15 can use completely non-blocking sendfile() using SF_NODISKIO
flag. When sendfile() returns EBUSY, nginx calls aio_read() to read single
byte. The first aio_read() preloads the first 128K part of a file in VM cache,
however, all successive aio_read()s preload just 16K parts of the file.
This makes non-blocking sendfile() usage ineffective for files larger
than 128K.
I've created a small patch for Darwin compatible F_RDAHEAD fcntl:
fcntl(fd, F_RDAHEAD, preload_size)
There is small incompatibilty: Darwin's fcntl allows just to enable/disable
read ahead, while the proposed patch allows to set exact preload size.
Currently the preload size affects vn_read() code path only and does not
affect on sendfile() code path. However, it can be easy extended on
sendfile() part too. The preload size is still limited by sysctl vfs.read_max.
The patch is against FreeBSD 7.2 and was tested on FreeBSD 7.2-STABLE only.
--
Igor Sysoev
http://sysoev.ru/en/
-------------- next part --------------
--- sys/sys/fcntl.h 2009-06-02 19:05:17.000000000 +0400
+++ sys/sys/fcntl.h 2009-09-12 20:29:34.000000000 +0400
@@ -118,6 +118,10 @@
#if __BSD_VISIBLE
/* Attempt to bypass buffer cache */
#define O_DIRECT 0x00010000
+#ifdef _KERNEL
+/* Read ahead */
+#define O_RDAHEAD 0x00020000
+#endif
#endif
/*
@@ -187,6 +191,7 @@
#define F_SETLK 12 /* set record locking information */
#define F_SETLKW 13 /* F_SETLK; wait if blocked */
#define F_SETLK_REMOTE 14 /* debugging support for remote locks */
+#define F_RDAHEAD 15 /* read ahead */
/* file descriptor flags (F_GETFD, F_SETFD) */
#define FD_CLOEXEC 1 /* close-on-exec flag */
--- sys/kern/vfs_vnops.c 2009-06-02 19:05:00.000000000 +0400
+++ sys/kern/vfs_vnops.c 2009-09-12 20:24:00.000000000 +0400
@@ -305,6 +305,9 @@
sequential_heuristic(struct uio *uio, struct file *fp)
{
+ if (fp->f_flag & O_RDAHEAD)
+ return(fp->f_seqcount << IO_SEQSHIFT);
+
if ((uio->uio_offset == 0 && fp->f_seqcount > 0) ||
uio->uio_offset == fp->f_nextoff) {
/*
--- sys/kern/kern_descrip.c 2009-08-28 18:50:11.000000000 +0400
+++ sys/kern/kern_descrip.c 2009-09-12 20:23:36.000000000 +0400
@@ -411,6 +411,7 @@
u_int newmin;
int error, flg, tmp;
int vfslocked;
+ uint64_t bsize;
vfslocked = 0;
error = 0;
@@ -694,6 +695,31 @@
vfslocked = 0;
fdrop(fp, td);
break;
+
+ case F_RDAHEAD:
+ FILEDESC_SLOCK(fdp);
+ if ((fp = fdtofp(fd, fdp)) == NULL) {
+ FILEDESC_SUNLOCK(fdp);
+ error = EBADF;
+ break;
+ }
+ if (fp->f_type != DTYPE_VNODE) {
+ FILEDESC_SUNLOCK(fdp);
+ error = EBADF;
+ break;
+ }
+ FILE_LOCK(fp);
+ if (arg) {
+ bsize = fp->f_vnode->v_mount->mnt_stat.f_iosize;
+ fp->f_seqcount = (arg + bsize - 1) / bsize;
+ fp->f_flag |= O_RDAHEAD;
+ } else {
+ fp->f_flag &= ~O_RDAHEAD;
+ }
+ FILE_UNLOCK(fp);
+ FILEDESC_SUNLOCK(fdp);
+ break;
+
default:
error = EINVAL;
break;
More information about the freebsd-hackers
mailing list