cvs commit: src/sys/i386/isa prof_machdep.c src/sys/amd64/amd64
prof_machdep.c
Bruce Evans
bde at FreeBSD.org
Wed Nov 28 18:01:21 PST 2007
bde 2007-11-29 02:01:21 UTC
FreeBSD src repository
Modified files:
sys/i386/isa prof_machdep.c
sys/amd64/amd64 prof_machdep.c
Log:
Don't use plain "ret" instructions at targets of jump instructions,
since the branch caches on at least Athlon XP through Athlon 64 CPU's
don't understand such instructions and guarantee a cache miss taking
at least 10 cycles. Use the documented workaround "ret $0" instead
("nop; ret" also works, but "ret $0" is probably faster on old CPUs).
Normal code (even asm code) doesn't branch to "ret", since there is
usually some cleanup to do, but the __mcount, .mcount and .mexitcount
entry points were optimized too well to have the minimum number of
instructions (3 instructions each if profiling is not enabled) and
they did this. I didn't see a significant number of cache misses for
.mexitcount, but for the shared "ret" for __mcount and .mcount I
observed cache misses costing 26 cycles each. For a send(2) syscall
that makes about 70 function calls, the cost of these cache misses
alone increased the syscall time from about 4000 cycles to about 7000
cycles. 4000 is for a profiling (GUPROF) kernel with profiling disabled;
after this fix, configuring profiling only costs about 600 cycles in the
4000, which is consistent with almost perfect branch prediction in the
mcounting calls.
Revision Changes Path
1.31 +2 -2 src/sys/amd64/amd64/prof_machdep.c
1.32 +2 -2 src/sys/i386/isa/prof_machdep.c
More information about the cvs-src
mailing list