Re: git: 9ac6eda6c6a3 - main - pipe: try to skip locking the pipe if a non-blocking fd is used

From: Gleb Smirnoff <glebius_at_freebsd.org>
Date: Wed, 17 Aug 2022 15:34:22 UTC
  Mateusz,

On Wed, Aug 17, 2022 at 02:23:51PM +0000, Mateusz Guzik wrote:
M> +	 * Try to avoid locking the pipe if we have nothing to do.
M> +	 *
M> +	 * There are programs which share one pipe amongst multiple processes
M> +	 * and perform non-blocking reads in parallel, even if the pipe is
M> +	 * empty.  This in particular is the case with BSD make, which when
M> +	 * spawned with a high -j number can find itself with over half of the
M> +	 * calls failing to find anything.
M> +	 */
M> +	if ((fp->f_flag & FNONBLOCK) != 0 && !mac_pipe_check_read_enabled()) {
M> +		if (__predict_false(uio->uio_resid == 0))
M> +			return (0);
M> +		if ((atomic_load_short(&rpipe->pipe_state) & PIPE_EOF) != 0)
M> +			return (0);
M> +		if (atomic_load_int(&rpipe->pipe_buffer.cnt) == 0 &&
M> +		    atomic_load_int(&rpipe->pipe_pages.cnt) == 0)
M> +			return (EAGAIN);
M> +	}

While this change definitely can optimize some programs it also can heavily
pessimize others. An application can share pipes and actually expect that
kernel will do proper serialization of writes by different threads. With
this change, the threads that previously were sleeping on the pipe would
enter a busy loop consisting write + kevent/poll syscalls.

This also breaks API. Although not explicitly narrated by POSIX or our
documentation, but there is a strong contract that, if application had
received a non-blocking fd from event dispatcher (kevent, poll, select)
as writable, then it is possible to write thre non-zero bytes with next
syscall.

This is a useful behavior, but by no means should be the default. It
should be explicitly enabled by a new fcntl() and make(1) can do that.

-- 
Gleb Smirnoff