Adding members to struct cpu_functions

Mon Oct 19 15:54:13 UTC 2009

In message: <05B19969-B238-4E3A-8326-624067F0362B at semihalf.com>
            Rafal Jaworowski <raj at semihalf.com> writes:
: On 2009-10-18, at 17:49, Nathan Whitehorn wrote:
[[ trimmed ]]
: > I just did the measurements on a 1.8 GHz PowerPC G5. There were four  
: > tests, each repeated 1 million times. "Load and store" involves  
: > incrementing a volatile int from 0 to 1e6 inline. "Direct calls"  
: > involves a branch to a function that returns 0 and does nothing  
: > else. "Function ptr" calls the same function via a pointer stored in  
: > a register, and "KOBJ calls" calls it via KOBJ. Here are the results  
: > (errors are +/- 0.5 ns for the function call measurements due to  
: > compiler optimization jitter, and 0 for load and store, since that  
: > takes a deterministic number of clock cycles):
: >
: > 32-bit kernel:
: > Load and store:  26.1 ns
: > Direct calls:   7.2 ns
: > Function ptr:   8.4 ns
: > KOBJ calls:     17.8 ns
: >
: > 64-bit kernel:
: > Load and store:  9.2 ns
: > Direct calls:   6.1 ns
: > Function ptr:   8.3 ns
: > KOBJ calls:     40.5 ns
: >
: > ABI changes make a large difference, as you can see. The cost of  
: > calling via KOBJ is non-negligible, but small, especially compared  
: > to the cost of doing anything involving memory. I don't know how  
: > this changes with ARM calling conventions.
: 
: Very interesting, thanks! Could you elaborate on the testing details  
: and share the diagnostic code so we could repeat this with other CPU  
: variations like Book-E PowerPC, or ARM?

I'd love to see this on MIPS too...

KOBJ is a big win for device configuration, where one memory I/O can
take 60 times these call numbers...

Warner