non-temporal copyin/copyout?
Andrew Gallatin
gallatin at cs.duke.edu
Fri Feb 17 07:01:09 PST 2006
Has anybody considered using non-temporal copies for the in-kernel
bcopy on amd64?
A quick test in userspace shows that for large copies, an adapted
pagecopy (from amd64/amd64/support.S) more than doubles bcopy
bandwidth from 1.2GB/s to 2.5GB/s on my on my Athlon64 X2 3800+.
I'm bringing this up because I've noticed that FreeBSD 10GbE
performance is far below Solaris/amd64 and linux/x86_64 when using the
PCI-e 10GbE adaptor that I'm doing drivers for. For example, Solaris
can recieve a netperf TCP stream at 9.75Gb/sec while using only 47%
CPU as measured by vmstat. (eg, it is using a little less than a
single core). In contrast, FreeBSD is limited to 7.7Gb/sec, and uses
nearly 90% CPU. When profiling with hwpmc, I see a profile which
shows up to 70% of the time is spent in copyout.
Thanks,
Drew
More information about the freebsd-amd64
mailing list