svn commit: r243914 - projects/bpfjit

Sat Dec 8 15:07:47 UTC 2012

On Sat, 8 Dec 2012 14:02:45 +0000
David Chisnall <theraven at theravensnest.org> wrote:

> On 8 Dec 2012, at 13:24, Aleksandr Rybalko wrote:
> 
> > On Thu, 06 Dec 2012 13:10:56 -0500
> > Jung-uk Kim <jkim at FreeBSD.org> wrote:
> > 
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >> 
> >> On 2012-12-06 03:49:36 -0500, Roman Divacky wrote:
> >>> Hi,
> >>> 
> >>> David Chisnall started bpf jitter based on llvm. You can check it
> >>> out here:
> >>> 
> >>> http://people.freebsd.org/~theraven/bpfjit/
> >>> 
> >>> 
> >>> It's based on the idea of jitting the code in userspace and
> >>> passing the resulting code to the kernel via some interface (this
> >>> part is not done yet).
> >> 
> >> Long time ago (about 10 years ago), I implemented something like
> >> that (i.e., compile BPF program to native machine code in
> >> userspace, then upload to kernel space) for my $job but I quickly
> >> replace it with BPF_JITTER for several reasons.  First of all,
> >> there is a big security risk.  A BPF filter program can be easily
> >> validated by kernel with bpf_validate(9).  We cannot do that for
> >> native machine code and we must not allow uploading arbitrary code
> >> to kernel space.  You may say it is well protected by /dev/bpf
> >> permissions but it is not good enough, i.e., all you need is read
> >> permission to inject code to kernel space.
> >> Second, LLVM is too heavy for BPF filter machine.  For example,
> > 
> > +1
> > Embedded FreeBSD will lost BPF if LLVM will be used for
> > compilation :)
> 
> Really?  I've run LLVM JITs for more complex languages than BPF on
> machines with only 128MB of RAM.  LLVM itself takes about 5MB of
> storage space and 20MB of RAM (used only during compilation, unloaded
> immediately afterwards).  One REALLY embedded systems, the filter
> rules can be run on another host and provided in the form of a kernel
> module using exactly the same code.

What about systems with total 8MB of flash and 32MB of RAM (maybe even
4MB and 16MB)? :) 

> 
> >> libtrace did that long ago:
> >> 
> >> http://www.wand.net.nz/trac/libtrace/changeset/1586
> >> 
> >> Someone actually benchmarked it with other JIT implementations:
> >> 
> >> http://carnivore.it/2011/12/28/bpf_performance
> 
> Reading the description there, I found it hard to believe that
> someone had actually written that LLVM implementation.  It is a case
> study in how not to implement an LLVM JIT.
> 
> >> LLVM compilation took too much time to be useful:
> >> 
> >> engine		filter cycles	compile cycles
> >> - ---------------+---------------+----------------
> >> jit-linux 	106468		33126+72796
> >> jit-freebsd 	113958		48292+72796
> >> llvm 		157394		380843640+72796
> >> pcap 		276910		72796
> >> linux	 	351391		9245+72796
> >> 
> >> I haven't tried theraven's implementation but I am afraid the
> >> result may be similar.  On top of that, it cannot be easily
> >> embedded in kernel.
> 
> Note that mine is a proof-of-concept prototype, however in my ad-hoc
> testing its output was about a third the size of the output of the
> current JIT.  A simpler JIT loses a lot through not being able to do
> even simple optimisations such as common subexpression elimination
> and through a very primitive register allocator.  
> 
> The extra cost comes in the form of more CPU cycles spent actually
> running the optimisation.  JIT compilation is always a trade: is the
> result being run enough times to offset the time spent optimising.
> I'd have thought this would be obvious for something that is run on
> every packet.  Even a very slow optimiser will be a net win after a
> while.  More importantly, the optimisation happens at the time the
> rules are loaded and so can run at a much lower priority, whereas the
> packet filter evaluation happens on the critical path for network
> traffic and impacts the latency of every single packet.  
> 
> David

-- 
Aleksandr Rybalko <ray at freebsd.org>