NFS client/buffer cache deadlock
Jilles Tjoelker
jilles at stack.nl
Wed Apr 20 10:12:23 PDT 2005
On Wed, Apr 20, 2005 at 11:52:33AM -0400, Brian Fundakowski Feldman wrote:
> On Wed, Apr 20, 2005 at 05:35:28PM +0200, Marc Olzheim wrote:
> > On Wed, Apr 20, 2005 at 11:20:38AM -0400, Brian Fundakowski Feldman wrote:
> > > > Btw.: I'm not sure write(),writev() and pwrite() are allowed to do short
> > > > writes on regular files... ?
> > > Our manpage is incorrect; POSIX states that they are (see earlier
> > > e-mail). There really is no alternative -- we simply can't build
> > > an NFS transaction larger than our buffer cache can accomodate.
> > > Note that short wries won't happen for normal buffer sizes, only
> > > excessively large ones. I really don't believe that writev() is meant
> > > to be used so that you can write gigantic data structures in a single
> > > transaction...
It is ok to return partial success if the first chunk of a large write
succeeded and a later chunk failed persistently, but not if it cannot be
performed as a single NFS transaction.
> > Ah, I was reading the SUSv2 page:
> > http://www.opengroup.org/onlinepubs/009695399/functions/write.html
> > instead of the POSIX version.
> > But in neither of those I can extrude the fact that it can return
> > with result < nbyte, without it being a permanent condition.
> > What phrase makes you conclude that it can ?
> This specific issue is not clear-cut; the best thing to do lies somewhere
> within the range of these scenarios:
> "If a write() requests that more bytes be written than there is room
> for (for example, [XSI] [Option Start] the process' file size limit
> or [Option End] the physical end of a medium), only as many bytes as
> there is room for shall be written. For example, suppose there is
> space for 20 bytes more in a file before reaching a limit. A write of
> 512 bytes will return 20. The next write of a non-zero number of bytes
> would give a failure return (except as noted below)."
This only applies to permanent conditions.
> "When attempting to write to a file descriptor (other than a pipe or
> FIFO) that supports non-blocking writes and cannot accept the data
> immediately:
> * If the O_NONBLOCK flag is clear, write() shall block the calling
> thread until the data can be accepted.
> * If the O_NONBLOCK flag is set, write() shall not block the
> thread. If some data can be written without blocking the thread,
> write() shall write what it can and return the number of bytes
> written. Otherwise, it shall return -1 and set errno to [EAGAIN]."
I think regular files do not support non-blocking writes, even if they
are on NFS; in any case, O_NONBLOCK is disabled by default.
> "[ENOBUFS] Insufficient resources were available in the system to
> perform the operation."
> I think the first is more useful behavior than the last. Supporting it
> should be exactly the same as supporting what happens if the actual
> filesystem fills up. In this case, the filesystem is being requested to
> write more "than there is room for."
The filesystem filling up is a totally different case as attempting the
rest of the write is futile in that case.
In a lot of code, a short write() is treated as a (fairly) persistent
error.
--
Jilles Tjoelker
More information about the freebsd-hackers
mailing list