sendfile(2) SF_NOPUSH flag proposal

Tue May 27 11:01:03 PDT 2003

:...
:> have done this several times in the past, e.g. with my soft
:> interrupt coelescing implementation that's now part of most
:> of the ethernet drivers people care about.
:> 
:> Actually, in this case, I'd just try to fix sendfile(2) to
:> do the packet coelescing I'd expect, given the relative
:> state of the TCP_NODELAY and TCP_NOPUSH options flags.
:
:Actually, sendfile() already works according to TCP_NOPUSH flag.
:I do not know about TCP_NODELAY - I do not work with it.
:But if you turn TCP_NOPUSH on then sendfile() will send the full packets.
:If you turn TCP_NOPUSH off then sendfile() will send some packets partially
:filled. It's correct.

    But considering the fairly high syscall overhead of sendfile() verses the
    1uS or so it takes to do a setsockopt(), implementing additional
    flags in the sendfile() API to work around sendfile()'s inefficient 
    implementation of the header sending code SOLELY to avoid the additional
    syscalls is not a good enough reason to change the API.  It would just
    be adding one hack on top of another with the side effect of the new
    hack being visible in the API.  This is bad.

    This (minor) problem *should* be solved by fixing the sendfile()
    implementation itself.

    It may well be that a reasonable solution would be to have sendfile()
    itself set TCP_NOPUSH internally to wrap the header sending writev()
    and the first data packet, then restore the previous state after 
    queueing the first data packet.  That would still be a hack, but at least
    it would be one that is not being made visible in the API.  Visible
    changes in APIs create porting headaches between UNIXes and should be
    avoided whenever possible.

						-Matt