kern/94772: FIFOs (named pipes) + select() == broken
Bruce Evans
bde at zeta.org.au
Wed Mar 22 01:09:24 UTC 2006
On Tue, 21 Mar 2006, Oliver Fromme wrote:
>> Description:
>
> I recently wondered why several of my scripts that use
> a named pipe (FIFO) don't work on FreeBSD.
>
> After some debugging it turned out that select() seems
> to be broken when used with FIFOs on FreeBSD 6.
> Particularly, this is the bug I'm suffering from:
>
> When a FIFO had been opened for reading and the writing
> process closes it, the reading process blocks in select(),
> even though the descriptor is ready for read(). If the
> select() call is omitted, read() returns 0 immediately
> indicating EOF. But as soon as you use select(), it blocks
> and there is no chance to detect the EOF condition.
See also:
PR 76125 (about the same bug)
PR 76144 (about a related bug in poll())
PR 53447 (about a another or the same related bug in poll())
PR 34020 (about the inverse of the bug (return on non-EOF) for select()
and poll())
The fix for PR 34020 inverted (a symptom of) a bug for poll() to
give a worse bug, and gave the (inverted) bug for select() where there
was no bug before. fifo_poll() now instructs lower layers to ignore
EOF by setting POLLINIGNEOF. This moves the bug for poll() and isn't
directly harmful for poll(), but it is directly harmful for select()
and it makes the real bug for poll() more harmful. The real bug for
poll() is that POLLHUP needs to be set to indicate EOF, but it isn't
actually set for many file types including named pipes. So we now
always get no indication of EOF where we should get POLLHUP or
POLLIN | POLLHUP. Previously, we always got EOF indicated by POLLIN,
and we also got EOF indicated by POLLIN in the tricky case where other
systems don't indicate EOF.
The tricky case is for a named pipe that has not had any writers during
the lifetime of current readers or thereabouts (races seem to be a
problem). Such a pipe can never have had a connection on it, so it
cannot be in the hangup state and it is clear that poll() should not
return POLLHUP for it. It is less clear that select() and poll()
should block waiting for a writer, but that is what other systems do
for poll() at least. I fixed FreeBSD long ago to not always block in
read() on a named piped when there are no current writers, since
blocking in read() is just wrong in the O_NONBLOCK case. This had the
side effect of making select() never block when there are no current
writers. poll() inherited this behaviour from select(). This behaviour
is wrong since select()/poll are the only reasonable ways to block
waiting for a writer, but it doesn't cause many problems.
>> How-To-Repeat:
>
> Please see the small test program below. Compile it like this:
> cc -O -o fifotest fifotest.c
> ...
> The same test program (with "err()" replaced by a small
> self-made function) runs without error on all other UNIX
> systems that I've tried: Linux 2.4.32, Solaris 10, and
> DEC UNIX 4.0 (predecessor of Tru64). By the way, it's
> even sufficient to do "cat /dev/null > fifo", i.e. not
> writing anything at all, but issuing EOF immediately.
> Under FreeBSD, nothing happens at all in that case.
> All other UNIX systems recognize EOF (select() returns).
Here is a program that tests more cases. I made it give no output
(for no errors) under Linux-2.6.10. It also gives no output for
the nameless pipe case under FreeBSD-4.10 and FreeBSD-oldcurrent
and for the named piped case under FreeBSD-4.10, but it fails with
(only) the error in this PR under FreeBSD-oldcurrent. Please test
it on Solaris etc. Compile it with -DNAMEDPIPE for the named pipe
case. In this case, it creates and leaves a fifo "p" in the current
directory (or doesn't handle the error if "p" exists but is not
an accessible fifo) but shouldn't have any other side effects.
%%%
#include <sys/select.h>
#include <sys/stat.h>
#include <err.h>
#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <unistd.h>
static pid_t cpid;
static pid_t ppid;
static volatile sig_atomic_t state;
static void
catch(int sig)
{
state++;
}
static void
child(int fd)
{
fd_set rfds;
struct timeval tv;
char buf[256];
#ifdef NAMEDPIPE
fd = open("p", O_RDONLY | O_NONBLOCK);
if (fd < 0)
err(1, "open for read");
#endif
kill(ppid, SIGUSR1);
/* XXX should check that fd fits in rfds. */
usleep(1);
while (state != 1)
;
#ifndef NAMEDPIPE
/*
* The connection cannot be restablished. Use the code that delays
* the read until after the writer disconnects since that case is
* more interesting.
*/
state = 4;
goto state4;
#endif
FD_ZERO(&rfds);
FD_SET(fd, &rfds);
tv.tv_sec = 0;
tv.tv_usec = 0;
if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0)
err(1, "select");
if (FD_ISSET(fd, &rfds))
warnx("state 1: expected clear; got set");
kill(ppid, SIGUSR1);
usleep(1);
while (state != 2)
;
FD_ZERO(&rfds);
FD_SET(fd, &rfds);
tv.tv_sec = 0;
tv.tv_usec = 0;
if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0)
err(1, "select");
if (!FD_ISSET(fd, &rfds))
warnx("state 2: expected set; got clear");
if (read(fd, buf, sizeof buf) != 1)
err(1, "read");
FD_ZERO(&rfds);
FD_SET(fd, &rfds);
tv.tv_sec = 0;
tv.tv_usec = 0;
if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0)
err(1, "select");
if (FD_ISSET(fd, &rfds))
warnx("state 2a: expected clear; got set");
kill(ppid, SIGUSR1);
usleep(1);
while (state != 3)
;
FD_ZERO(&rfds);
FD_SET(fd, &rfds);
tv.tv_sec = 0;
tv.tv_usec = 0;
if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0)
err(1, "select");
if (!FD_ISSET(fd, &rfds))
warnx("state 3: expected set; got clear");
kill(ppid, SIGUSR1);
/*
* Now we expect a new writer, and a new connection too since
* we read all the data. The only new point is that we didn't
* start quite from scratch since the read fd is not new. Check
* startup state as above, but don't do the read as above.
*/
usleep(1);
while (state != 4)
;
state4:
FD_ZERO(&rfds);
FD_SET(fd, &rfds);
tv.tv_sec = 0;
tv.tv_usec = 0;
if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0)
err(1, "select");
if (FD_ISSET(fd, &rfds))
warnx("state 4: expected clear; got set");
kill(ppid, SIGUSR1);
usleep(1);
while (state != 5)
;
FD_ZERO(&rfds);
FD_SET(fd, &rfds);
tv.tv_sec = 0;
tv.tv_usec = 0;
if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0)
err(1, "select");
if (!FD_ISSET(fd, &rfds))
warnx("state 5: expected set; got clear");
kill(ppid, SIGUSR1);
usleep(1);
while (state != 6)
;
/*
* Now we have no writer, but should still have data from the old
* writer. Check that we have both a data condition and a hangup
* condition, and that the data can read the data in the usual way.
* Since Linux does this, programs must not quite reading when they
* see POLLHUP; they must see POLLHUP without POLLIN (or another
* input condition) before they decide that there is EOF. gdb-6.1.1
* is an example of a broken program that quits on POLLHUP only --
* see its event-loop.c.
*/
FD_ZERO(&rfds);
FD_SET(fd, &rfds);
tv.tv_sec = 0;
tv.tv_usec = 0;
if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0)
err(1, "select");
if (!FD_ISSET(fd, &rfds))
warnx("state 6: expected set; got clear");
if (read(fd, buf, sizeof buf) != 1)
err(1, "read");
FD_ZERO(&rfds);
FD_SET(fd, &rfds);
tv.tv_sec = 0;
tv.tv_usec = 0;
if (select(fd + 1, &rfds, NULL, NULL, &tv) < 0)
err(1, "select");
if (!FD_ISSET(fd, &rfds))
warnx("state 6a: expected set; got clear");
close(fd);
kill(ppid, SIGUSR1);
exit(0);
}
static void
parent(int fd)
{
usleep(1);
while (state != 1)
;
#ifdef NAMEDPIPE
fd = open("p", O_WRONLY | O_NONBLOCK);
if (fd < 0)
err(1, "open for write");
#endif
kill(cpid, SIGUSR1);
usleep(1);
while (state != 2)
;
if (write(fd, "", 1) != 1)
err(1, "write");
kill(cpid, SIGUSR1);
usleep(1);
while (state != 3)
;
if (close(fd) != 0)
err(1, "close for write");
kill(cpid, SIGUSR1);
usleep(1);
while (state != 4)
;
#ifndef NAMEDPIPE
return;
#endif
fd = open("p", O_WRONLY | O_NONBLOCK);
if (fd < 0)
err(1, "open for write");
kill(cpid, SIGUSR1);
usleep(1);
while (state != 5)
;
if (write(fd, "", 1) != 1)
err(1, "write");
kill(cpid, SIGUSR1);
usleep(1);
while (state != 6)
;
if (close(fd) != 0)
err(1, "close for write");
kill(cpid, SIGUSR1);
usleep(1);
while (state != 7)
;
}
int
main(void)
{
int fd[2];
int i;
#ifdef NAMEDPIPE
if (mkfifo("p", 0666) != 0 && errno != EEXIST)
err(1, "mkfifo");
#endif
signal(SIGUSR1, catch);
ppid = getpid();
for (i = 0; i < 2; i++) {
#ifndef NAMEDPIPE
if (pipe(fd) != 0)
err(1, "pipe");
#else
fd[0] = -1;
fd[1] = -1;
#endif
state = 0;
switch (cpid = fork()) {
case -1:
err(1, "fork");
case 0:
(void)close(fd[1]);
child(fd[0]);
break;
default:
(void)close(fd[0]);
parent(fd[1]);
break;
}
}
return (0);
}
%%%
The error output from this is:
%%%
selectp: state 3: expected set; got clear
selectp: state 6a: expected set; got clear
selectp: state 3: expected set; got clear
selectp: state 6a: expected set; got clear
%%%
These messages are all caused by the same bug. "state 3" is EOF without
any data ever having been readable. "state 6" is EOF with data readable
(the test for this passed so there is no output for it). "state 6a" is
EOF after having read the data available in state 6. It is good that
state 6a fails in the same way as state 3 -- at least the bug doesn't
seem to involve races. The duplicate messages are caused by iterating
the test to see if the bug depends on previous activity on the pipe
(but the program is probably too careful cleaning up for iteration to
show problems).
A similar test program (not enclosed) shows many more bugs for poll():
- no output under Linux-2.6.10
- FreeBSD-4.10 on nameless pipes:
% poll: state 6a: expected POLLHUP; got 0x11
% poll: state 6a: expected POLLHUP; got 0x11
0x11 is POLLIN | POLLHUP. Nameless pipes are one of the few file
types for which POLLHUP is actually implemented (POLLHUP is also
implemented (not quite right) for ttys but isn't implemented for
any other important file type). Linux returns only POLLHUP here.
This is best, since it allows distinguishing the case of pure EOF
from the case of EOF with data reasable. However, buggy
applications like gdb don't actually understand the difference
between EOF-with-data and pure EOF (see the comment in the test
program). Also, select() depends on pipe_poll() returning POLLIN
to work.
- FreeBSD-oldcurrent on nameless pipes:
% poll: state 6a: expected POLLHUP; got 0x11
% poll: state 6a: expected POLLHUP; got 0x11
No change. The kernel code for select() and poll() hasn't been either
fixed or broken for nameless pipes.
- FreeBSD-4.10 on named pipes:
% pollp: state 3: expected POLLHUP; got 0x1
% pollp: state 6: expected POLLIN | POLLHUP; got 0x1
% pollp: state 6a: expected POLLHUP; got 0x1
% pollp: state 3: expected POLLHUP; got 0x1
% pollp: state 6: expected POLLIN | POLLHUP; got 0x1
% pollp: state 6a: expected POLLHUP; got 0x1
0x1 is POLLIN. FreeBSD-4.10 never returns POLLHUP for named pipes.
This and/or returning POLLIN in too many cases causes the problem in
PRs 34020, 53447 and 76144.
- FreeBSD-oldcurrent on nameless pipes:
% pollp: state 6: expected POLLIN | POLLHUP; got 0x1
% pollp: state 6: expected POLLIN | POLLHUP; got 0x1
FreeBSD-current still never returns POLLHUP for nameless pipes.
However, my test program determines whether POLLHUP should have been
returned in some cases and reduces the bugs to the above. It uses
FreeBSD(>4)'s (my) POLLINIGNEOF to do this. POLLINIGNEOF was supposed
to be usable for this to limit the damage caused by the fix for PR34020,
since I knew that this fix would break EOF handling in some cases.
However, POLLINIGNEOF doesn't really work. To use it, the test
program has to use nonblocking syscalls for everything including
poll() (it gets nonblocking polls using a timeout of 0). Even with
this, EOF can't be detected in state 6 (EOF-with-data). But the
bug in state 6 is small (it even compensates for the bug in gdb).
Here is the code to determine POLLHUP:
%%%
#ifdef POLLINIGNEOF
/*
* FreeBSD's POLLINIGNEOF (which causes half of the bugs when the kernel
* uses it) can be used to fix up the broken cases 3 and 6a if the kernel
* uses it, i.e., for named pipes but not for pipes. Note that the sense
* of POLLINIGNEOF is reversed when passed to the kernel -- it means
* don't-ignore-EOF in .events and if it is set there then it means
* not-POLLHUP in .revents.
*/
int
mypoll(struct pollfd *fds, nfds_t nfds, int timeout)
{
struct pollfd mypfd;
int r;
r = poll(fds, nfds, timeout);
if (nfds != 1 || timeout != 0 || fds[0].revents & POLLIN)
return (r);
mypfd = fds[0];
mypfd.events |= POLLINIGNEOF;
r = poll(&mypfd, 1, 0);
if (r >= 0) {
if (mypfd.revents &= POLLIN) {
mypfd.revents &= ~POLLIN;
mypfd.revents |= POLLHUP;
}
fds[0].revents = mypfd.revents;
}
return (r);
}
#define poll(fds, nfds, timeout) mypoll((fds), (nfds), (timeout))
#endif
%%%
With this userland fixup for the missing POLLHUP, the above shows that
states 3 and 6a have been fixed for poll() on named pipes. These are
precisely the states that have been broken for select() on named
pipes. Toggling the seting of POLLIN for these states toggles the
location of one of the bugs.
Without this userland fixup, the output for FreeBSD-oldcurrent on
named pipes is:
% state 3: expected POLLHUP; got 0
% state 6: expected POLLIN | POLLHUP; got 0x1
% state 6a: expected POLLHUP; got 0
% state 3: expected POLLHUP; got 0
% state 6: expected POLLIN | POLLHUP; got 0x1
% state 6a: expected POLLHUP; got 0
The POLLHUP flag is now never set, so states 3 and 6a aren't actually
fixed; in fact they are more broken than before, just like for select()
-- now no poll flag is set for these cases, so poll() and select()
don't even see normal hangups unless they are used with a timeout
and/or with the negative-logic POLLINIGNEOF as in my test program.
IIRC, PRs 53447 and 76144 are about this problem.
Quick fix (?): #defining POLLINIGNEOF as 0 in <sys/poll.h> should give the
FreeBSD-4.10 behaviour.
Fix (?):
- actually implement returning POLLHUP in sopoll() and other places. Return
POLLHUP but not POLLIN for the pure-EOF case. Return POLLIN* | POLLHUP
for EOF-with-data.
- remove POLLINIGNEOF and associated complications in sopoll(), fifo_poll()
and <sys/poll.h>
- change selscan() to check for POLLHUP so that POLLIN, POLLIN | POLLHUP
and POLLHUP all act the same for select()
- remove POLLHUP from the comment in selscan(). Fix the rest of this
comment or remove it (most backends are too broken to return poll flags
if appropriate, and the comment only mentions one of the other poll flags
that selscan() ignores)
- remove the corresponding comment in pollscan() since it is wrong and says
nothing relevant (pollscan() just accepts whatever flags the backends set).
Bruce
More information about the freebsd-bugs
mailing list