cvs commit: src/sys/fs/unionfs union_subr.c
Robert Watson
rwatson at FreeBSD.org
Fri Apr 25 14:19:18 UTC 2008
On Fri, 25 Apr 2008, Daichi GOTO wrote:
>> Per my earlier e-mail, and assuming I understand correctly, I feel not only
>> will this lead to new panics (due to dangling socket pointers and
>> incomplete garbage collection),
>
> We, I and ozawa-san, cannot image the case that leads new panic. Would you
> tell us how to get the panic if you can get it? Is it rare case to get or
> not?
The explanation is somewhat complicated, so I apologize if I'm unclear in
explaining it.
The UNIX domain (local) socket subsystem provides an IPC service based on
sockets, but using the file system as a name space so that processes can
rendezvous with one another. A server process, such as syslogd, will call
bind(2) to associate a socket with a path, such as /var/run/log or
/var/run/logpriv. In the file system, the way this works is that a vnode is
hooked up to the namespace of type VSOCK, its v_socket pointer is initialized
to point at the socket structure, and presumably the file system does some
underlying storage magic to put it on disk (such as creating an inode). The
socket also maintains a back pointer to the vnode, unp_vnode, which will be
used when the socket is closed, which is where things get tricky.
Consider the implementation of UNIX domain socket close -- when the socket is
closed, the protocol state is detached by uipc_detach, which does the
clears the pointer from the vnode to the socket and vice versa:
if ((vp = unp->unp_vnode) != NULL) {
unp->unp_vnode->v_socket = NULL;
unp->unp_vnode = NULL;
}
Once uipc_detach has returned, the unpcb structure (pointed to by unp in the
above code) is no longer valid, and shortly thereafter, the socket pointer is
also invalid, both pointing to freed memory. The UNIX domain socket code is
very careful to remove the reference from the vnode so that new threads won't
dereference the pointer improperly. However, notice that in the above code,
only the "bottom" layer v_socket pointer is cleared, not higher layers, which
means that those higher layers will now point at freed memory, which may lead
to panics.
I haven't tried this, but I suspect you will be able to reproduce the panic if
you: start syslogd against a base file system, union mount it to a new
location, run the "syslog" command relative to the new file system mount, kill
the base syslogd, then run the "syslog" command a second time. On the first
occasion, it will work, since the v_socket pointer in the top layer points at
the socket referenced by the bottom layer. However, when you kill syslogd, it
closes the socket, which frees the socket structure pointed to by v_socket in
the bottom layer, but not in the top layer. The next run of syslog will
follow the stale v_socket pointer.
Does this make sense?
>> but it will also lead to possibly incorrect semantics for unionfs(upper
>> layers can write to objects readable via the lower layer).
>
>> Some parts of this patch are fine, but the copying of v_socket pointers
>> between layers is not correct. Please consider backing that part of the
>> change out.
>
> Yes, we have noticed above. But.... at least, our patch solves problem of
> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/118346 we believe. If I roll
> back that, that problem is still there.
I would assert than an error is better than a panic. :-)
> Uhmmm... is it better to get back it? And if you have some ideas to solve
> this issue, please tell us :) Thanks
I'm not 100% sure what the right solution is, but one approach might be to
have the vnodes at the different layers simply refer to different sockets.
Applicaitons should unlink the old socket in the top layer when they discover
a stale socket there, and then create a new socket (masking the bottom layer
socket), which should just work. Have you tried unlinking the top layer
socket and testing whether that works?
Robert N M Watson
Computer Laboratory
University of Cambridge
More information about the cvs-src
mailing list