buf_ring in HEAD is racy
Ryan Stone
rysto32 at gmail.com
Sat Dec 14 05:04:58 UTC 2013
I am seeing spurious output packet drops that appear to be due to
insufficient memory barriers in buf_ring. I believe that this is the
scenario that I am seeing:
1) The buf_ring is empty, br_prod_head = br_cons_head = 0
2) Thread 1 attempts to enqueue an mbuf on the buf_ring. It fetches
br_prod_head (0) into a local variable called prod_head
3) Thread 2 enqueues an mbuf on the buf_ring. The sequence of events
is essentially:
Thread 2 claims an index in the ring and atomically sets br_prod_head (say to 1)
Thread 2 sets br_ring[1] = mbuf;
Thread 2 does a full memory barrier
Thread 2 updates br_prod_tail to 1
4) Thread 2 dequeues the packet from the buf_ring using the
single-consumer interface. The sequence of events is essentialy:
Thread 2 checks whether queue is empty (br_cons_head == br_prod_tail),
this is false
Thread 2 sets br_cons_head to 1
Thread 2 grabs the mbuf from br_ring[1]
Thread 2 sets br_cons_tail to 1
5) Thread 1, which is still attempting to enqueue an mbuf on the ring.
fetches br_cons_tail (1) into a local variable called cons_tail. It
sees cons_tail == 1 but prod_head == 0 and concludes that the ring is
full and drops the packet (incrementing br_drops unatomically, I might
add)
I can reproduce several drops per minute by configuring the ixgbe
driver to use only 1 queue and then sending traffic from concurrent 8
iperf processes. (You will need this hacky patch to even see the
drops with netstat, though:
http://people.freebsd.org/~rstone/patches/ixgbe_br_drops.diff)
I am investigating fixing buf_ring by using acquire/release semantics
rather than load/store barriers. However, I note that this will
apparently be the second attempt to fix buf_ring, and I'm seriously
questioning whether this is worth the effort compared to the
simplicity of using a mutex. I'm not even convinced that a correct
lockless implementation will even be a performance win, given the
number of memory barriers that will apparently be necessary.
More information about the freebsd-net
mailing list