fastforward/routing: a 3 million packet-per-second system?
Adrian Chadd
adrian at freebsd.org
Mon Jul 28 20:43:42 UTC 2014
On 28 July 2014 13:37, Ryan Stone <rysto32 at gmail.com> wrote:
> On Sun, Jul 27, 2014 at 4:42 PM, George Neville-Neil
> <gnn at neville-neil.com> wrote:
>> Chiming in late, but don't you mean instruction-retired instead of
>> CPU_CLK_UNHALTED_CORE?
>>
>> Best,
>> George
>
> In my experience instruction-retired gives very misleading profiler
> output in most cases. The problem is that instruction-retired gives
> equal weight to all instructions, which means that it does not take
> into account instructions with long latencies because they (for
> example) missed the cache. CPU_CLK_UNHALTED_CORE (or its alias,
> unhalted-cycles) is a much better event because it is a nearer proxy
> for time-based sampling, which is really what you're interested in
> when trying to reduce runtime of processes.
Right.
It is a union of all the things that screw with you - frontend stall,
backend/retire stall, microcode operation stall, FPU length stall,
branch misprediction stalls, L3 miss (ie, memory) stall, cache
ping-ponging stalls.
Figuring out -which- of those above are the problem requires a little
further digging.
> My one big complaint with unhalted-cycles is that it does not take
> into effect CPU time spent in busy-wait loops that use the pause
> instruction, so it vastly unweights time spent adaptively spinning on
> kernel mutexes, for instance.
Well, it depends if you want to know about the places that it's
spending in busy-wait loops using PAUSE or not.
(Are there any flags / modifiers that have the CPU not count that?)
> I'm also not sure what it does when the
> CPU is adjusting its frequency, but that's not a case that I ever have
> to deal with personally.
That's the difference between _CORE and _REF.
-a
More information about the freebsd-net
mailing list