Scalability problem from route refcounting
Andre Oppermann
andre at freebsd.org
Thu Mar 15 16:23:20 UTC 2007
Kris Kennaway wrote:
> I have recently started looking at database performance over gigabit
> ethernet, and there seems to be a bottleneck coming from the way route
> reference counting is implemented. On an 8-core system it looks like
> we spend a lot of time waiting for the rtentry mutex:
>
> max total wait_total count avg wait_avg cnt_hold cnt_lock name
> [...]
> 408 950496 1135994 301418 3 3 24876 55936 net/if_ethersubr.c:397 (sleep mutex:bge1)
> 974 968617 1515169 253772 3 5 14741 60581 dev/bge/if_bge.c:2949 (sleep mutex:bge1)
> 2415 18255976 1607511 253841 71 6 125174 3131 netinet/tcp_input.c:770 (sleep mutex:inp)
> 233 1850252 2080506 141817 13 14 0 126897 netinet/tcp_usrreq.c:756 (sleep mutex:inp)
> 384 6895050 2737492 299002 23 9 92100 73942 dev/bge/if_bge.c:3506 (sleep mutex:bge1)
> 626 5342286 2760193 301477 17 9 47616 54158 net/route.c:147 (sleep mutex:radix node head)
> 326 3562050 3381510 301477 11 11 133968 110104 net/route.c:197 (sleep mutex:rtentry)
> 146 947173 5173813 301477 3 17 44578 120961 net/route.c:1290 (sleep mutex:rtentry)
> 146 953718 5501119 301476 3 18 63285 121819 netinet/ip_output.c:610 (sleep mutex:rtentry)
> 50 4530645 7885304 1423098 3 5 642391 788230 kern/subr_turnstile.c:489 (spin mutex:turnstile chain)
>
> i.e. during a 30 second sample we spend a total of >14 seconds (on all
> cpus) waiting to acquire the rtentry mutex.
>
> This appears to be because (among other things), we increment and then
> decrement the route refcount for each packet we send, each of which
> requires acquiring the rtentry mutex for that route before adjusting
> the refcount. So multiplexing traffic for lots of connections over a
> single route is being partly rate-limited by those mutex operations.
The rtentry locking actually isn't that much of a problem in itself
and rtalloc1() in net/route.c only gets the blame because this function
aquires the lock for the routing table entry and returns a locked entry.
It is the job of the callers to unlock it as soon as possible again.
Here arpresolve() in netinet/if_ether.c is the offending function keeping
the lock over an extended period causing the contention and long wait
times. ARP is a horrible mess and I don't have a quick fix for this.
There is some work in progress for quite some time to replace the current
ARP code with something more adequate. That's not finished yet though.
> This is not the end of the story though, the bge driver is a serious
> bottleneck on its own (e.g. I nulled out the route locking since it is
> not relevant in my environment, at least for the purposes of this
> test, and that exposed bge as the next problem -- but other drivers
> may not be so bad).
--
Andre
More information about the freebsd-net
mailing list