Performance of SheevaPlug on 8-stable
Grzegorz Bernacki
gjb at semihalf.com
Mon Mar 8 15:51:01 UTC 2010
Mark Tinguely wrote:
> FreeBSD-current has kernel and user witness turned on. Witness is for
> locks, so it should not change the performance of a tight arithmetic loop
> like this.
>
> I don't know the marvell interals, and from what I tell, their technial
> docs require NDA. That said, many of the ARM processors also have a
> instruction internal cache (instruction prefetch) in addition to the
> instruction cache. I don't think the prefetch has an enable/disable.
>
> It looks like from the cpu identification that the the branch prediction
> is turned on. Branch prediction compensates for the longer pipelines.
> I can't see how in the tight loop how that could go astray.
>
> Thus says the ARM ARM:
>
> ARM implementations are free to choose how far ahead of the
> current point of execution they prefetch instructions; either
> a fixed or a dynamically varying number of instructions. As well
> as being free to choose how many instructions to prefetch, an ARM
> implementation can choose which possible future execution path to
> prefetch along. For example, after a branch instruction, it can
> choose to prefetch either the instruction following the branch
> or the instruction at the branch target. This is known as branch
> prediction.
>
> There are a few data dangling allocations that I would like to see
> closed from the multiple kernel allocation fix. *IN THEORY, IF* a page
> is allocated via the arm_nocache (DMA COHERENT) or a sendfile, then
> it is never marked as unallocated. *IN THEORY*, if that page is used
> again, then we could falsely believe that page is being shared and
> we turn off the cache, eventhough it is not shared.
>
> http://www.casselton.net/~tinguely/arm_pmap_unmanaged.diff
>
> * Disclaimer: I am not sure if DMA COHERENT nor sendfiles are used in
> the Sheeva implementation. This is a theoritical observation of a side
> effect of the multiple kernel mapping patch that we did just before
> FreeBSD 8-release.
>
> --Mark Tinguely
This is probably caused by mechanism which turns of cache for shared pages.
When I add applied following path:
diff --git a/sys/arm/arm/pmap.c b/sys/arm/arm/pmap.c
index 390dc3c..d17c0cc 100644
--- a/sys/arm/arm/pmap.c
+++ b/sys/arm/arm/pmap.c
@@ -1401,6 +1401,8 @@ pmap_fix_cache(struct vm_page *pg, pmap_t pm, vm_offset_t va)
*/
TAILQ_FOREACH(pv, &pg->md.pv_list, pv_list) {
+ if (pv->pv_flags & PVF_EXEC)
+ return;
/* generate a count of the pv_entry uses */
if (pv->pv_flags & PVF_WRITE) {
if (pv->pv_pmap == pmap_kernel())
execution time of 'test' program is:
mv78100-4# time ./test
5.000u 0.000s 0:05.40 99.8% 40+1324k 0+0io 0pf+0w
and without this path is:
mv78100-4# time ./test
295.000u 0.000s 4:56.01 99.7% 40+1322k 0+0io 0pf+0w
I think we need to handle executable pages in different way.
grzesiek
More information about the freebsd-arm
mailing list