Use of V_tcbinfo lock in tcp_do_segment()
Robert Watson
rwatson at FreeBSD.org
Sun Jul 22 17:55:15 UTC 2012
On Thu, 24 May 2012, Vijay Singh wrote:
> Folks, I am trying to understand the need to hold the V_tcbinfo lock in part
> of this function [code included below for reference].
>
> At present this causes the socket upcall to be called with the V_tcbinfo
> lock held, which I'd like to avoid. We release this lock right after this
> section.
>
> Looking at the code, it seems the lock is needed if we were in FIN_WAIT_2
> and tcp_close() the connection. The lock also seems to be protecting
> V_twq_2msl.
>
> Would it be an acceptable solution if we deferred calling socantrecvmore()
> till after the lock can be dropped (after the swtich statement).
> tcp_twstart() can be changed to return tp if the connections survives, or
> NULL if it doesn't, much like what tcp_close() does. Also a new lock could
> be added to protect the V_rwq_2msl queue.
>
> If this sounds acceptable, I can generate a patch against -CURRENT. I would
> appreciate feedback.
Hi Vijay:
Sorry for responding to your query with such a delay.
A quick glance at tcp_twstart() appears to confirm your description -- I read
it as requiring the incpbinfo lock for several reasons:
1) Potential call to tcp_close().
2) Potential call to tcp_tw_2msl_scan() which may in turn call tcp_twclose(),
which requires the pcbinfo lock (see 5 below).
3) Direct use of V_twq_2msl.
4) Potential call to tcp_tw_2msl_scan(), which uses V_twq_2msl, and may also
call tcp_tw_close (see 5).
5) In the tcp_twclose() callgraph subtree, the pcbinfo lock is used for
several things, including tcp_tw_2msl_stop() and in_pcbfree(). The call to
sofree() may recurse, under some circumstances, back into pru_detach(), and
then into tcp_detach() which requires the lock.
Especially with the pcbgroup reworking so that the pcbhash rather than pcbinfo
lock is used in a number of places, there are increasing numbers of places
where this sort of optimisation could be made.
One caution to ponder: socantrecvmore() can trigger an upcall (per your
comment) in the receive path. It used to be that upcalls did worrying things,
such as call back down the stack -- I think we've largely eliminated those
(NFS UDP was one such example, now fixed?) and therefore they are probably OK,
but worth checking the so_upcall consumers to see what the implications might
be, if any.
Robert
>
> -vijay
>
>
> relevant code from tcp_do_segment():
>
> if (thflags & TH_FIN) {
> if (TCPS_HAVERCVDFIN(tp->t_state) == 0) {
> socantrcvmore(so);
> /*
> * If connection is half-synchronized
> * (ie NEEDSYN flag on) then delay ACK,
> * so it may be piggybacked when SYN is sent.
> * Otherwise, since we received a FIN then no
> * more input can be expected, send ACK now.
> */
> if (tp->t_flags & TF_NEEDSYN)
> tp->t_flags |= TF_DELACK;
> else
> tp->t_flags |= TF_ACKNOW;
> tp->rcv_nxt++;
> }
> switch (tp->t_state) {
>
> /*
> * In SYN_RECEIVED and ESTABLISHED STATES
> * enter the CLOSE_WAIT state.
> */
> case TCPS_SYN_RECEIVED:
> tp->t_starttime = ticks;
> /* FALLTHROUGH */
> case TCPS_ESTABLISHED:
> tp->t_state = TCPS_CLOSE_WAIT;
> break;
>
> /*
> * If still in FIN_WAIT_1 STATE FIN has not been acked so
> * enter the CLOSING state.
> */
> case TCPS_FIN_WAIT_1:
> tp->t_state = TCPS_CLOSING;
> break;
>
> /*
> * In FIN_WAIT_2 state enter the TIME_WAIT state,
> * starting the time-wait timer, turning off the other
> * standard timers.
> */
> case TCPS_FIN_WAIT_2:
> INP_INFO_WLOCK_ASSERT(&V_tcbinfo);
> KASSERT(ti_locked == TI_WLOCKED, ("%s: dodata "
> "TCP_FIN_WAIT_2 ti_locked: %d", __func__,
> ti_locked));
>
> tcp_twstart(tp);
> INP_INFO_WUNLOCK(&V_tcbinfo);
> return;
> }
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>
More information about the freebsd-net
mailing list