non-temporal copyin/copyout?
Joseph Koshy
joseph.koshy at gmail.com
Fri Feb 17 07:50:33 PST 2006
> I'm bringing this up because I've noticed that FreeBSD 10GbE
> performance is far below Solaris/amd64 and linux/x86_64 when
> using the PCI-e 10GbE adaptor that I'm doing drivers for.
> For example, Solaris can recieve a netperf TCP stream at
There was a bug in my port of netperf; I had left the
`HISTOGRAM' option turned on, which causes it to slow
down significantly.
v2.3.1,1 is the latest & bugfixed version of the port.
> 9.75Gb/sec while using only 47% CPU as measured by vmstat.
> (eg, it is using a little less than a single core). In
> contrast, FreeBSD is limited to 7.7Gb/sec, and uses nearly
> 90% CPU. When profiling with hwpmc, I see a profile which
> shows up to 70% of the time is spent in copyout.
You could use the following events to probe the system:
"k8-dc-miss" : data cache misses
"k8-bu-fill-request-l2-miss,mask=dc-fill" : L2 fills for the
data cache
"k8-dc-misaligned-data-reference": in case there are any
"k8-fr-interrupts-masked-while-pending-cycles": for
finding spots in the code where spin-locks are being
held for long.
You may need to tweak the sample rate (the -n option to
pmcstat); the default of 65536 events per sample may be too
high or too low for some of these. Using pmcstat -p EVENT
will give a feel for a good sample rate to choose for EVENT.
--
FreeBSD Volunteer, http://people.freebsd.org/~jkoshy
More information about the freebsd-amd64
mailing list