fastforward/routing: a 3 million packet-per-second system?

Adrian Chadd adrian at freebsd.org
Mon Jul 28 20:43:42 UTC 2014


On 28 July 2014 13:37, Ryan Stone <rysto32 at gmail.com> wrote:
> On Sun, Jul 27, 2014 at 4:42 PM, George Neville-Neil
> <gnn at neville-neil.com> wrote:
>> Chiming in late, but don't you mean instruction-retired instead of
>> CPU_CLK_UNHALTED_CORE?
>>
>> Best,
>> George
>
> In my experience instruction-retired gives very misleading profiler
> output in most cases.  The problem is that instruction-retired gives
> equal weight to all instructions, which means that it does not take
> into account instructions with long latencies because they (for
> example) missed the cache.  CPU_CLK_UNHALTED_CORE (or its alias,
> unhalted-cycles) is a much better event because it is a nearer proxy
> for time-based sampling, which is really what you're interested in
> when trying to reduce runtime of processes.

Right.

It is a union of all the things that screw with you - frontend stall,
backend/retire stall, microcode operation stall, FPU length stall,
branch misprediction stalls, L3 miss (ie, memory) stall, cache
ping-ponging stalls.

Figuring out -which- of those above are the problem requires a little
further digging.

> My one big complaint with unhalted-cycles is that it does not take
> into effect CPU time spent in busy-wait loops that use the pause
> instruction, so it vastly unweights time spent adaptively spinning on
> kernel mutexes, for instance.

Well, it depends if you want to know about the places that it's
spending in busy-wait loops using PAUSE or not.
(Are there any flags / modifiers that have the CPU not count that?)

> I'm also not sure what it does when the
> CPU is adjusting its frequency, but that's not a case that I ever have
> to deal with personally.

That's the difference between _CORE and _REF.



-a


More information about the freebsd-net mailing list