TCP stack lock contention with short-lived connections
Navdeep Parhar
np at FreeBSD.org
Fri May 23 21:37:42 UTC 2014
On 05/23/14 13:52, Julien Charbon wrote:
>
> Hi,
>
> On 23/05/14 14:06, Julien Charbon wrote:
>> On 27/02/14 11:32, Julien Charbon wrote:
>>> On 07/11/13 14:55, Julien Charbon wrote:
>>>> On Mon, 04 Nov 2013 22:21:04 +0100, Julien Charbon
>>>> <jcharbon at verisign.com> wrote:
>>>>> I have put technical and how-to-repeat details in below PR:
>>>>>
>>>>> kern/183659: TCP stack lock contention with short-lived connections
>>>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=183659
>>>>>
>>>>> We are currently working on this performance improvement effort; it
>>>>> will impact only the TCP locking strategy not the TCP stack logic
>>>>> itself. We will share on freebsd-net the patches we made for
>>>>> reviewing and improvement propositions; anyway this change might also
>>>>> require enough eyeballs to avoid tricky race conditions introduction
>>>>> in TCP stack.
>
> Joined the two cumulative patches (tcp-scale-inp-list-v1.patch and
> tcp-scale-pcbinfo-rlock-v1.patch) we discussed the most at BSDCan 2014.
>
> First one is (tcp-scale-inp-list-v1.patch):
>
> [tcp-scaling] Introduce the INP_LIST global mutex for protecting pcbinfo
> global structures
> https://github.com/verisign/freebsd/commit/12c62273f052911aabe6ed283cea76cdd72c9493
>
>
> This change improves nothing in performance (neither degrades), it
> simply implements what we are trying to achieve: Decompose further
> pcbinfo lock (aka ipi_lock or INP_INFO).
>
> Ideally, pcbinfo globally shared structures are protected by leaf
> mutexes (mutexes that are taken last), not by a root mutex (mutex taken
> first). The current lock ordering is:
>
> ipi_lock > inpcb lock > ipi_hash_lock, pcbgroup locks
>
> ipi_lock being a root mutex is explained by its original task: Protect
> the pcbinfo as a whole.
>
> Then, with this change, we added a new ipi_list_lock leaf mutex
> dedicated to protect structures previously under ipi_lock umbrella, i.e.:
>
> - inpcb global list: ipi_listhead
> - inpcb global list counter: ipi_count
> - inpcb global list generated index: ipi_gencnt
>
> and it permits to implement the second (meatier) change
> (tcp-scale-pcbinfo-rlock-v1.patch):
>
> [alpha][tcp-scaling] Use INP_INFO_RLOCK in critical path, and use
> INP_INFO_WLOCK in full INP loops.
> https://github.com/verisign/freebsd/commit/4633ac8c0b8d379fbda5fb9ffc921c2e4786db46
>
>
> Now that ipi_lost has lost is duty to protect pcbinfo globally shared
> structures, its last (clear) duty is to hold inp creation/destruction
> when a full traversal of global inp list is performed, as this
> traversals expect inp list to be stable, e.g.:
>
> tcp_ccalgounload()
> https://github.com/verisign/freebsd/blob/388f0a87958fde5e644e01798f44b58588eb1dc2/sys/netinet/tcp_subr.c#L848
>
>
> Thus (performance-wise) critical paths can now take ipi_lock _read_
> lock, e.g.:
>
> tcp_input()
> tcp_usr_shutdown()
> tcp_usr_close()
> tcp_twstart()
>
> and, on the other side, functions performing full inp list traversal
> will take the INP_INFO _write_ lock:
>
> tcp_ccalgounload()
> tcp_pcblist()
> in_pcbpurgeif0()
> etc...
>
> This patch doubles the performance improvement with our short-live TCP
> workload.
>
> _However_ it would be a miracle that this change does not introduce new
> race condition(s) (hence the 'alpha' tag in commit message). Most of
> TCP stack locking strategy being now on inpcb lock shoulders. That
> said, from our tests point of view, this change is completely stable: No
> kernel/lock assertion, no unexpected TCP behavior, stable performance
> results. Moreover, before tagging this change as 'beta' we need to test
> more thoroughly these features:
>
> - VNET,
> - PCBGROUP/RSS/TCP timer per cpu,
> - TCP Offloading (we need a NIC with a good TCP offloading support)
I can assess the impact (and fix any fallout) on the parts of the kernel
that deal with TCP_OFFLOAD, and the TOE driver in dev/cxgbe/tom. But I
was hoping to do that only after there was general agreement on net@
that these locking changes are sound and should be taken into HEAD.
Lack of reviews seems to be holding this back, correct?
Regards,
Navdeep
>
> Early testers, test ideas, reviewers and memories about previous (and
> not documented or unclear) ipi_lock duties are more than welcome.
>
> Thanks.
>
> --
> Julien
>
>
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>
More information about the freebsd-net
mailing list