FreeBSD 7.0: sockets stuck in CLOSED state...
Robert Watson
rwatson at FreeBSD.org
Fri Jun 27 08:14:57 UTC 2008
On Thu, 26 Jun 2008, Robert Watson wrote:
> I think the first logical step is to wait for the application to get into
> that state again, and then run procstat or fstat to dump the file descriptor
> away for the process. Presumably in the normal steady state, you expect to
> see a few IPC sockets (syslog, etc), a TCP listen socket, and some number of
> in-progress TCP sessions. The question, of course, is whether you see a lot
> more file descriptors than that, and in particular, ones that matched the
> CLOSED entries in netstat. If you find that there are lots of open file
> descriptors and they match up approximately with netstat, then it's an
> application bug that just manifests a bit differently in 7.x than in 6.x.
> On the other hand, if you see only a small number of open file descriptors,
> then we may be looking at something quite a bit more complicated.
Just a public followup for those following the thread: Ali has sent me netstat
and sockstat/fstat data. It looks like each of the TCP connections appear in
the netstat output in the CLOSED state also appears in sockstat with a file
descriptor.
This suggests an bug in which file descriptors are occasionally leaked,
perhaps early in their life cycle as there's a bit of data in the input
buffer. However, it's unclear still if it's an application bug (occasionally
missing a close() on an accepted file descriptor) or a kernel bug (accept() or
close() misbehaving such that the application doesn't know the file descriptor
is open, or has tried to close it but no succeeded). Ali mentioned in his
e-mail that he was seeing EBADF on occasion from close(), which could mean a
bug is causing the wrong file descriptor number to be passed in. If there's a
kernel bug involved, then you could imagine it being along the lines of
"accept(2) returns a file descriptor but also sets an error, so the
application simply sees the error but the file descriptor remains installed in
the process's file descriptor table", leading to the appearance of a leak.
I've asked Ali to do a bit more debugging and tracing of the application to
see if we can reach any conclusions about this. In particular, if he traces
to a file all file descriptor numbers returned by accept(2), then we can later
compare that file with the leaked descriptors present in netstat/sockstat and
decide whether the application *should* have known they were open or not.
I also spotted a bug in the netstat/sockstat output, unrelated, in which the
port number of the inpcb is cleared when the connection closes, meaning that
netstat shows '*' as the port number. This isn't really necessary, but does
lead to potentially confusing output.
Robert N M Watson
Computer Laboratory
University of Cambridge
More information about the freebsd-net
mailing list