HEADS UP: zerocopy bpf commits impending

Wed Apr 23 10:02:38 UTC 2008

On Tue, 8 Apr 2008 13:28:18 +0100 (BST), "Robert Watson"
<rwatson at FreeBSD.org> said:
> 
> On Tue, 8 Apr 2008, Darren Reed wrote:
> 
> > Is there a performance analysis of the copy vs zerocopy available? (I don't 
> > see one in the paper, just a "to do" item.)
> >
> > The numbers I'm interested in seeing are how many Mb/s you can capture 
> > before you start suffering packet loss.  This needs to be done with 
> > sequenced packets so that you can observe gaps in the sequence captured.
> 
> We've done some analysis, and a couple of companies have the zero-copy
> BPF 
> code deployed.  I hope to generate a more detailed analysis before the 
> developer summit so we can review it at BSDCan.  The basic observation is
> that 
> for quite a few types of network links, the win isn't in packet loss per
> se, 
> but in reduced CPU use, freeing up CPU for other activities.  There are a 
> number of sources of win:
> 
> - Reduced system call overhead -- as load increases, # system calls goes
> down,
>    especially if you get a two-CPU pipeline going.
> 
> - Reduced memory access, especially for larger buffer sizes, avoids
> filling
>    the cache twice (first in copyout, then again in using the buffer in
>    userspace).
> 
> - Reduced lock contention, as only a single thread, the device driver
> ithread,
>    is acquiring the bpf descriptor's lock, and it's no longer contending
>    with
>    the user thread.
> 
> One interesting, and in retrospect reasonable, side effect is that user
> CPU 
> time goes up in the SMP scenario, as cache misses on the BPF buffer move
> from 
> the read() system call to userspace.  And, as you observe, you have to
> use 
> somewhat larger buffer sizes, as in the previous scenario there were
> three 
> buffers: two kernel buffers and a user buffer, and now there are simply
> two 
> kernel buffers shared directly with user space.
> 
> The original committed version has a problem in that it allows only one
> kernel 
> buffer to be "owned" by userspace at a time, which can lead to excess
> calls to 
> select(); this has now been corrected, so if people have run performance 
> benchmarks, they should update to the new code and re-run them.
> 
> I don't have numbers off-hand, but 5%-25% were numbers that appeared in
> some 
> of the measurements, and I'd like to think that the recent fix will
> further 
> improve that.

Out of curiosity, were those numbers for single cpu/core systems
or systems with more than one cpu/core active/available?

I know the testing I did was all single threaded, so moving time
from kernel to user couldn't be expected to make a large overall
difference in a non-SMP kernel (NetBSD-something at the time.)

Darren
-- 
  Darren Reed
  darrenr at fastmail.net