Kernel panic on FreeBSD 9.0-beta2
dave jones
s.dave.jones at gmail.com
Fri Oct 14 08:49:45 UTC 2011
2011/10/12 Mikolaj Golub wrote:
>
> On Wed, 12 Oct 2011 09:53:34 +0800 dave jones wrote:
>
> dj> On Fri, Oct 7, 2011 at 9:12 AM, dave jones wrote:
> >> 2011/10/4 Mikolaj Golub :
> >>>
> >>> On Sat, 1 Oct 2011 14:15:45 +0800 dave jones wrote:
> >>>
> >>> dj> On Fri, Sep 30, 2011 at 9:41 PM, Robert Watson wrote:
> >>> >>
> >>> >> On Wed, 28 Sep 2011, Mikolaj Golub wrote:
> >>> >>
> >>> >>> On Mon, 26 Sep 2011 16:12:55 +0200 K. Macy wrote:
> >>> >>>
> >>> >>> KM> Sorry, didn't look at the images (limited bw), I've seen something KM>
> >>> >>> like this before in timewait. This "can't happen" with UDP so will be KM>
> >>> >>> interested in learning more about the bug.
> >>> >>>
> >>> >>> The panic can be easily triggered by this:
> >>> >>
> >>> >> Hi:
> >>> >>
> >>> >> Just catching up on this thread. I think the analysis here is generally
> >>> >> right: in 9.0, you're much more likely to see an inpcb with its in_socket
> >>> >> pointer cleared in the hash list than in prior releases, and
> >>> >> in_pcbbind_setup() trips over this.
> >>> >>
> >>> >> However, at least on first glance (and from the perspective of invariants
> >>> >> here), I think the bug is actualy that in_pcbbind_setup() is asking
> >>> >> in_pcblookup_local() for an inpcb and then access the returned inpcb's
> >>> >> in_socket pointer without acquiring a lock on the inpcb. Structurally, it
> >>> >> can't acquire this lock for lock order reasons -- it already holds the lock
> >>> >> on its own inpcb. Therefore, we should only access fields that are safe to
> >>> >> follow in an inpcb when you hold a reference via the hash lock and not a
> >>> >> lock on the inpcb itself, which appears generally OK (+/-) for all the
> >>> >> fields in that clause but the t->inp_socket->so_options dereference.
> >>> >>
> >>> >> A preferred fix would cache the SO_REUSEPORT flag in an inpcb-layer field,
> >>> >> such as inp_flags2, giving us access to its value without having to walk
> >>> >> into the attached (or not) socket.
> >>> >>
> >>> >> This raises another structural question, which is whether we need a new
> >>> >> inp_foo flags field that is protected explicitly by the hash lock, and not
> >>> >> by the inpcb lock, which could hold fields relevant to address binding. I
> >>> >> don't think we need to solve that problem in this context, as a slightly
> >>> >> race on SO_REUSEPORT is likely acceptable.
> >>> >>
> >>> >> The suggested fix does perform the desired function of explicitly detaching
> >>> >> the inpcb from the hash list before the socket is disconnected from the
> >>> >> inpcb. However, it's incomplete in that the invariant that's being broken is
> >>> >> also relied on for other protocols (such as raw sockets). The correct
> >>> >> invariant is that inp_socket is safe to follow unconditionally if an inpcb
> >>> >> is locked and INP_DROPPED isn't set -- the bug is in "locked" not in
> >>> >> "INP_DROPPED", which is why I think this is the wrong fix, even though it
> >>> >> prevents a panic :-).
> >>>
> >>> dj> Hello Robert,
> >>>
> >>> dj> Thank you for taking your valuable time to find out the problem.
> >>> dj> Since I don't have idea about network internals, would you have a patch
> >>> dj> about this? I'd be glad to test it, thanks again.
> >>>
> >>> Here is the patch that implements what Robert suggests.
> >>>
> >>> Dave, could you test it?
> >>
> >> Sure. Thanks for cooking the patch.
> >> Machines have been running two days now without panic.
>
> Thank you for testing it.
>
> dj> Is there any plan to commit your fix? Thank you.
> dj> I'd upgrade to 9.0-release from beta-2 once it's released.
>
> I have an upgraded version of the patch, which is under review now. I have
> been waiting for the response before asking you to test it, but it would be
> great if you try it not waiting :-).
>
> As it was pointed by Robert the previous version introduced a regression:
> SO_REUSEPORT was ignored if setsockopt was called after bind (the old cached
> value was still used). So the updated version fixes this and also contains
> several other fixes, the most important among them is that it fixes the panic
> for IPv6 bind case too.
Thanks for cooking the patch. Machines have been running up days
without any panic.
> --
> Mikolaj Golub
Regards,
Dave.
More information about the freebsd-net
mailing list