cvs commit: src/sys/kern init_main.c kern_malloc.c md5c.c subr_autoconf.c subr_mbuf.c subr_prf.c tty_subr.c vfs_cluster.c vfs_subr.c

Tue Jul 22 16:32:59 PDT 2003

"Poul-Henning Kamp" wrote:

>    If Y < X, then you have by definition a performance gain.

Only if you look at the classic model where you ignore things like
speculation and assume that every instruction is executed exactly once etc.
Mainframe optimization strategy is not necessarily applicable to to
contemporary cpus.

To consider:
- costs of branches and branch prediction hits and misses
- cache effects
- memory bandwidth effects.  eg: uninlining the VOP_* stuff costs a ~5% world
slowdown due to extra memory IO for argument processing on i386.
- speculative execution
- not all the code is executed
and so on.

If adding 2K of code to the kernel for 3 inlines means that the fast path
execution through the extra code is in fact faster in the usual case, then
its worth it.  We dont have to execute or cache all of that extra 2K of
code.  cache line granularity and hardware prefetch is limited to 64 or
128 bytes for a reason.

I suspect Alan Cox already knows the answer to 'which is faster' in
the vm_object_backing_scan() case and he's waiting for you to put your foot
in it. :-)

Cheers,
-Peter
--
Peter Wemm - peter at wemm.org; peter at FreeBSD.org; peter at yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5