non-temporal copyin/copyout?

Joseph Koshy joseph.koshy at gmail.com
Fri Feb 17 19:20:19 PST 2006


On 2/17/06, Andrew Gallatin <gallatin at cs.duke.edu> wrote:

ag>  "k8-dc-miss" : data cache misses
ag>  91.5    6466.00  6466.00        0  100.00%           copyout [1]

ag>  "k8-bu-fill-request-l2-miss,mask=dc-fill" : L2 fills
ag> for the data cache
ag>  88.2    3866.00  3866.00        0  100.00%           copyout [1]

Certainly copyout() appears to be thrashing the cache.

ag>  "k8-dc-misaligned-data-reference": in case there are any

ag>  99.5   66763.00 66763.00        0  100.00%           copyout [1]

The code in question "/usr/src/sys/amd64/amd64/support.S" has:
    216         ENTRY(copyout)
    ...
    249         shrq    $3,%rcx
    250         cld
    251         rep
    252         movsq
    253         movb    %dl,%cl
    254         andb    $7,%cl
    255         rep
    256         movsb

i.e., it doesn't handle the case where the `from_kernel'
or `to_user' addresses are misaligned to their natural
boundaries.  IIRC `rep movsq' works best if both the source
and destination addresses are 8-byte aligned.

If we are going to use `movntq' then we may as well take
care of alignment issues too.

jk>  "k8-fr-interrupts-masked-while-pending-cycles": for
jk>      finding spots in the code where spin-locks are being
jk>      held for long.

ag> I had to tweak the sample rate to 512 for this one.
ag>  52.5     330.00   330.00        0  100.00%           acpi_cpu_idle [1]
ag>  10.4     395.00    65.00        0  100.00%           spinlock_exit [2]
ag>   9.1     452.00    57.00        0  100.00%           acpi_cpu_c1 [3]

This is interesting too, but I'm not sure how much of
an effect it has on this particular benchmark.

--
FreeBSD Volunteer,     http://people.freebsd.org/~jkoshy


More information about the freebsd-amd64 mailing list