Kernel panic on FreeBSD 9.0-beta2
Mikolaj Golub
trociny at freebsd.org
Tue Oct 4 13:49:30 UTC 2011
On Sat, 1 Oct 2011 14:15:45 +0800 dave jones wrote:
dj> On Fri, Sep 30, 2011 at 9:41 PM, Robert Watson wrote:
>>
>> On Wed, 28 Sep 2011, Mikolaj Golub wrote:
>>
>>> On Mon, 26 Sep 2011 16:12:55 +0200 K. Macy wrote:
>>>
>>> KM> Sorry, didn't look at the images (limited bw), I've seen something KM>
>>> like this before in timewait. This "can't happen" with UDP so will be KM>
>>> interested in learning more about the bug.
>>>
>>> The panic can be easily triggered by this:
>>
>> Hi:
>>
>> Just catching up on this thread. I think the analysis here is generally
>> right: in 9.0, you're much more likely to see an inpcb with its in_socket
>> pointer cleared in the hash list than in prior releases, and
>> in_pcbbind_setup() trips over this.
>>
>> However, at least on first glance (and from the perspective of invariants
>> here), I think the bug is actualy that in_pcbbind_setup() is asking
>> in_pcblookup_local() for an inpcb and then access the returned inpcb's
>> in_socket pointer without acquiring a lock on the inpcb. Structurally, it
>> can't acquire this lock for lock order reasons -- it already holds the lock
>> on its own inpcb. Therefore, we should only access fields that are safe to
>> follow in an inpcb when you hold a reference via the hash lock and not a
>> lock on the inpcb itself, which appears generally OK (+/-) for all the
>> fields in that clause but the t->inp_socket->so_options dereference.
>>
>> A preferred fix would cache the SO_REUSEPORT flag in an inpcb-layer field,
>> such as inp_flags2, giving us access to its value without having to walk
>> into the attached (or not) socket.
>>
>> This raises another structural question, which is whether we need a new
>> inp_foo flags field that is protected explicitly by the hash lock, and not
>> by the inpcb lock, which could hold fields relevant to address binding. I
>> don't think we need to solve that problem in this context, as a slightly
>> race on SO_REUSEPORT is likely acceptable.
>>
>> The suggested fix does perform the desired function of explicitly detaching
>> the inpcb from the hash list before the socket is disconnected from the
>> inpcb. However, it's incomplete in that the invariant that's being broken is
>> also relied on for other protocols (such as raw sockets). The correct
>> invariant is that inp_socket is safe to follow unconditionally if an inpcb
>> is locked and INP_DROPPED isn't set -- the bug is in "locked" not in
>> "INP_DROPPED", which is why I think this is the wrong fix, even though it
>> prevents a panic :-).
dj> Hello Robert,
dj> Thank you for taking your valuable time to find out the problem.
dj> Since I don't have idea about network internals, would you have a patch
dj> about this? I'd be glad to test it, thanks again.
Here is the patch that implements what Robert suggests.
Dave, could you test it?
>> Robert
dj> Best regards,
dj> Dave.
--
Mikolaj Golub
-------------- next part --------------
A non-text attachment was scrubbed...
Name: in_pcb.REUSEPORT.patch
Type: text/x-diff
Size: 2321 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20111004/1f88cdd9/in_pcb.REUSEPORT.bin
More information about the freebsd-net
mailing list