panic in tcp_do_segment()
Andre Oppermann
andre at freebsd.org
Tue Apr 9 08:35:33 UTC 2013
On 09.04.2013 10:16, Peter Holm wrote:
> On Mon, Apr 08, 2013 at 02:13:40PM +0200, Andre Oppermann wrote:
>> On 05.04.2013 13:09, Matt Miller wrote:
>>> Hey Rick,
>>>
>>> I believe Juan and I have root caused this crash recently. The t_state =
>>> 0x1, TCPS_LISTEN, in the link provided at the time of the assertion.
>>>
>>> In tcp_input(), if we're in TCPS_LISTEN, SO_ACCEPTCONN should be set on the
>>> socket and we should never enter tcp_do_segment() for this state. I think
>>> if you look in your corefile, you'll see the socket *doesn't* have this
>>> flag set in your case.
>>>
>>> 1043 /*
>>> 1044 * When the socket is accepting connections (the INPCB is in
>>> LISTEN
>>> 1045 * state) we look into the SYN cache if this is a new
>>> connection
>>> 1046 * attempt or the completion of a previous one. Because listen
>>> 1047 * sockets are never in TCPS_ESTABLISHED, the V_tcbinfo lock
>>> will be
>>> 1048 * held in this case.
>>> 1049 */
>>> 1050 if (so->so_options & SO_ACCEPTCONN) {
>>> 1051 struct in_conninfo inc;
>>> 1052
>>> 1053 KASSERT(tp->t_state == TCPS_LISTEN, ("%s: so accepting
>>> but "
>>> 1054 "tp not listening", __func__));
>>> ...
>>> 1356 syncache_add(&inc, &to, th, inp, &so, m, NULL, NULL);
>>> 1357 /*
>>> 1358 * Entry added to syncache and mbuf consumed.
>>> 1359 * Everything already unlocked by syncache_add().
>>> 1360 */
>>> 1361 INP_INFO_UNLOCK_ASSERT(&V_tcbinfo);
>>> 1362 return;
>>> 1363 }
>>> ...
>>> 1384 /*
>>> 1385 * Segment belongs to a connection in SYN_SENT, ESTABLISHED or
>>> later
>>> 1386 * state. tcp_do_segment() always consumes the mbuf chain,
>>> unlocks
>>> 1387 * the inpcb, and unlocks pcbinfo.
>>> 1388 */
>>> 1389 tcp_do_segment(m, th, so, tp, drop_hdrlen, tlen, iptos,
>>> ti_locked);
>>>
>>> I think this has to do with this patch in soclose() where SO_ACCEPTCONN is
>>> being turned off in soclose(). I suspect if you look at the other threads
>>> in your corefile, you'll see one at this point in soclose() operating on
>>> the same socket as the one in the tcp_do_segment() thread.
>>>
>>> http://svnweb.freebsd.org/base?view=revision&revision=243627
>>>
>>> 817 /*
>>> 818 * Prevent new additions to the accept queues due
>>> 819 * to ACCEPT_LOCK races while we are draining them.
>>> 820 */
>>> 821 so->so_options &= ~SO_ACCEPTCONN;
>>> 822 while ((sp = TAILQ_FIRST(&so->so_incomp)) != NULL) {
>>> 823 TAILQ_REMOVE(&so->so_incomp, sp, so_list);
>>> 824 so->so_incqlen--;
>>> 825 sp->so_qstate &= ~SQ_INCOMP;
>>> 826 sp->so_head = NULL;
>>> 827 ACCEPT_UNLOCK();
>>> 828 soabort(sp);
>>> 829 ACCEPT_LOCK();
>>> 830 }
>>>
>>> Juan had evaluated this code path and it seemed safe to just drop the
>>> packet in this case:
>>>
>>> + /*
>>> + * In closing down the socket, the SO_ACCEPTCONN flag is removed to
>>> + * prevent new connections from being established. This means that
>>> + * any frames in that were in the midst of being processed could
>>> + * make it here. Need to just drop the frame.
>>> + */
>>> + if (TCPS_LISTEN == tp->t_state) {
>>> + TCPSTAT_INC(tcps_rcvwhileclosing);
>>> + goto drop;
>>> + }
>>> KASSERT(tp->t_state > TCPS_LISTEN, ("%s: TCPS_LISTEN",
>>> __func__));
>>>
>>> Or, if there's someone more familiar with the locking in these paths, they
>>> may be able to come up with a way to restructure the locks and logic to
>>> close this window.
>>
>> Matt, Juan,
>>
>> excellent analysis. I don't see a better approach to handle this
>> under the current ACCEPT_LOCK model.
>>
>> Compared to your patch I'd like to handle this race earlier before
>> we hit tcp_do_segment().
>>
>> Could you please review the attached patch which handles it right
>> after the SO_ACCEPTCONN / syncache check?
>>
>> --
>> Andre
>>
>> Index: netinet/tcp_input.c
>> ===================================================================
>> --- netinet/tcp_input.c (revision 249253)
>> +++ netinet/tcp_input.c (working copy)
>> @@ -1351,6 +1351,16 @@
>> */
>> INP_INFO_UNLOCK_ASSERT(&V_tcbinfo);
>> return;
>> + } else if (tp->t_state == TCPS_LISTEN) {
>> + /*
>> + * When a listen socket is torn down the SO_ACCEPTCONN
>> + * flag is removed first while connections are drained
>> + * from the accept queue in a unlock/lock cycle of the
>> + * ACCEPT_LOCK, opening a race condition allowing a SYN
>> + * attempt go through unhandled.
>> + */
>> + TCPSTAT_INC(tcps_rcvdwhileclosing);
>> + goto drop;
>> }
>>
>> #ifdef TCP_SIGNATURE
>
> I was able to reproduce the original "panic: tcp_do_segment:
> TCPS_LISTEN" with ease; see
> http://people.freebsd.org/~pho/stress/log/tcp.txt.
>
> With your patch (minus the TCPSTAT_INC) I got this "panic: Lock (rw)
> tcp locked @ netinet/tcp_input.c:1432."
>
> http://people.freebsd.org/~pho/stress/log/tcp2.txt
Please replace the 'goto drop' with 'goto dropunlock' to fix the panic.
--
Andre
More information about the freebsd-net
mailing list