fast bcopy...

Thu May 3 10:49:40 UTC 2012

2012/5/3, Steven Atreju <snatreju at googlemail.com>:
> K. Macy wrote [2012-05-03 02:58+0200]:
>> It's highly chipset and processor dependent what works best.
>
> Yes, of course.
> Though i was kinda, even shocked, once i've seen this first:
>
>   http://marc.info/?l=dragonfly-commits&m=132241713812022&w=2
>
> So we don't use our assembler version for new gccs and HAMMER or
> SSE3+ (the decision for these was rather arbitrarily, except they
> were yet existent for an instant implementation).
>
>> Intel now has non-temporal loads and stores which work much
>> better in some cases but provide little benefit in others.
>
> Yes, our 2002 tests have shown that these were *extremely*
> dependent upon alignment.  (Note: 2002. o-)
> Hmm, it doesn't really matter, but i guess this is a good time to
> thank the FreeBSD hackers for that FPU stack FILD/FISTP idea!
> I'll append the copy related notes of our doc/memperf.txt.
> Thanks,

I made an implementation of fpu unwinding and mmx copy to see if they
were really making a difference years ago (reimplementing bcopy,
memcopy, etc.).

What really mattered with hw available at that time (pentium4) was the
alignment and use of non-temporal operations on heavilly contended
cache-lines.
In few words it is more important we engineer the "buffer" layout
rather than the functions themselves.

Attilio

-- 
Peace can only be achieved by understanding - A. Einstein