Re: git: 867c27c23a5c - main - nfscl: Change IO_APPEND writes to direct I/O
- In reply to: Konstantin Belousov : "Re: git: 867c27c23a5c - main - nfscl: Change IO_APPEND writes to direct I/O"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 16 Dec 2021 14:58:23 UTC
Kostik wrote: >On Wed, Dec 15, 2021 at 04:39:28PM +0000, Rick Macklem wrote: >> The branch main has been updated by rmacklem: >> >> URL: https://cgit.FreeBSD.org/src/commit/?id=867c27c23a5c469b27611cf53cc2390b5a193fa5 >> >> commit 867c27c23a5c469b27611cf53cc2390b5a193fa5 >> Author: Rick Macklem <rmacklem@FreeBSD.org> >> AuthorDate: 2021-12-15 16:35:48 +0000 >> Commit: Rick Macklem <rmacklem@FreeBSD.org> >> CommitDate: 2021-12-15 16:35:48 +0000 >> >> nfscl: Change IO_APPEND writes to direct I/O >> >> IO_APPEND writes have always been very slow over NFS, due to >> the need to acquire an up to date file size after flushing >> all writes to the NFS server. >> >> This patch switches the IO_APPEND writes to use direct I/O, >> bypassing the buffer cache. As such, flushing of writes >> normally only occurs when the open(..O_APPEND..) is done. >> It does imply that all writes must be done synchronously >> and must be committed to stable storage on the file server >> (NFSWRITE_FILESYNC). >> >> For a simple test program that does 10,000 IO_APPEND writes >> in a loop, performance improved significantly with this patch. >> >> For a UFS exported file system, the test ran 12x faster. >> This drops to 3x faster when the open(2)/close(2) are done >> for each loop iteration. >> For a ZFS exported file system, the test ran 40% faster. >> >> The much smaller improvement may have been because the ZFS >> file system I tested against does not have a ZIL log and >> does have "sync" enabled. >> >> Note that IO_APPEND write performance is still much slower >> than when done on local file systems. >> >> Although this is a simple patch, it does result in a >> significant semantics change, so I have given it a >> large MFC time. > >How is the buffer cache coherency is handled then? >Imagine that other process either reads from this file, or even have it >mapped. What does ensure that reads and page cache see the data written >by direct path? Well, for the buffer cache case, there is code near the beginning of ncl_write() (the NFS VOP_WRITE()) that calls ncl_vinvalbuf() for the IO_APPEND case. As such, any data in the buffer cache gets invalidated whenever an Append write occurs. But, now that I look at it, it does not do anything w.r.t. mmap'd files. (The direct I/O stuff has been there for a long time, but it isn't enabled by default, so it probably doesn't get tested much. Also, it has a sysctl that allows mmap for direct I/O, which is enabled by default. It causes getpage/putpage to fail if it is not enabled.) So, it looks like code to invalidate pages needs to be done along with the ncl_vinvalbuf()? --> I'll come up with a patch and then get you to review it. Thanks for pointing this out, rick