L1 cache thrashing affects performance of HIMENO benchmark
Adrian Chadd
adrian at freebsd.org
Sun Jan 6 02:03:12 UTC 2013
On 5 January 2013 13:54, Jason Evans <jasone at freebsd.org> wrote:
>> Jason - any comments?
>
> There are many variations on this class of performance problem, and the short of it is that only the application can have adequate understanding of data structure layout and access patterns to reliably make optimal use of the cache. However, it is possible for the allocator to lay out memory in a more haphazard fashion than jemalloc, phkmalloc, etc. do, such that the application can be cache-oblivious and (usually) not suffer worst case consequences as happened in this case. Extent-based allocators like dlmalloc often get this "for free" for a significant range of allocation sizes. jemalloc could be modified to this end, but a full solution would necessarily increase internal fragmentation. It might be worth experimenting with nonetheless.
For at least this particular computational workload, the loss in
throughput based on cache thrashing is significant enough to learn
FreeBSD a negative mark in computational workloads.
It'd be interesting to see which other workloads FreeBSD behaves poorly in.
In fact, it'd be doubly interesting to get some people who _do_
computational workloads to do some profiling using oprofile/pmc and
report back. Maybe if we wrote a wiki page on how to do this kind of
profiling and how to interpret the results.
In any case, yes - I think it's worth pursuing this further as it's
very likely not the only workload that exhibits this kind of cache
unhappiness.
Adrian
More information about the freebsd-hackers
mailing list