non-temporal copyin/copyout?
Joseph Koshy
joseph.koshy at gmail.com
Fri Feb 17 19:20:19 PST 2006
On 2/17/06, Andrew Gallatin <gallatin at cs.duke.edu> wrote:
ag> "k8-dc-miss" : data cache misses
ag> 91.5 6466.00 6466.00 0 100.00% copyout [1]
ag> "k8-bu-fill-request-l2-miss,mask=dc-fill" : L2 fills
ag> for the data cache
ag> 88.2 3866.00 3866.00 0 100.00% copyout [1]
Certainly copyout() appears to be thrashing the cache.
ag> "k8-dc-misaligned-data-reference": in case there are any
ag> 99.5 66763.00 66763.00 0 100.00% copyout [1]
The code in question "/usr/src/sys/amd64/amd64/support.S" has:
216 ENTRY(copyout)
...
249 shrq $3,%rcx
250 cld
251 rep
252 movsq
253 movb %dl,%cl
254 andb $7,%cl
255 rep
256 movsb
i.e., it doesn't handle the case where the `from_kernel'
or `to_user' addresses are misaligned to their natural
boundaries. IIRC `rep movsq' works best if both the source
and destination addresses are 8-byte aligned.
If we are going to use `movntq' then we may as well take
care of alignment issues too.
jk> "k8-fr-interrupts-masked-while-pending-cycles": for
jk> finding spots in the code where spin-locks are being
jk> held for long.
ag> I had to tweak the sample rate to 512 for this one.
ag> 52.5 330.00 330.00 0 100.00% acpi_cpu_idle [1]
ag> 10.4 395.00 65.00 0 100.00% spinlock_exit [2]
ag> 9.1 452.00 57.00 0 100.00% acpi_cpu_c1 [3]
This is interesting too, but I'm not sure how much of
an effect it has on this particular benchmark.
--
FreeBSD Volunteer, http://people.freebsd.org/~jkoshy
More information about the freebsd-amd64
mailing list