svn commit: r253877 - in projects/atomic64/sys: amd64/include i386/include

Fri Aug 2 13:52:13 UTC 2013

On Fri, 2 Aug 2013, Jung-uk Kim wrote:

> Log:
>  Reimplement atomic operations on PDEs and PTEs in pmap.h.  This change
>  significantly reduces duplicate code.  Also, it may improve and even correct
>  some questionable implementations.

Do they all (or any) need to be atomic with respect to multiple CPUs?
It's hard to see how concurrent accesses to page tables can work worh
without higher-level locking than is provided by atomic ops.

> Modified: projects/atomic64/sys/amd64/include/pmap.h
> ==============================================================================
> --- projects/atomic64/sys/amd64/include/pmap.h	Fri Aug  2 00:08:00 2013	(r253876)
> +++ projects/atomic64/sys/amd64/include/pmap.h	Fri Aug  2 00:20:04 2013	(r253877)
> @@ -185,41 +185,13 @@ extern u_int64_t KPML4phys;	/* physical
> pt_entry_t *vtopte(vm_offset_t);
> #define	vtophys(va)	pmap_kextract(((vm_offset_t) (va)))
>
> -static __inline pt_entry_t
> -pte_load(pt_entry_t *ptep)
> -{
> -	pt_entry_t r;
> -
> -	r = *ptep;
> -	return (r);
> -}

This function wasn't atomic with respect to multiple CPUs.  Except on
i386 with PAE, but then it changes a 64-bit object on a 32-bit CPU,
so it needs some locking just to be atomic with respect to a single CPU.

> -static __inline pt_entry_t
> -pte_load_store(pt_entry_t *ptep, pt_entry_t pte)
> -{
> -	pt_entry_t r;
> -
> -	__asm __volatile(
> -	    "xchgq %0,%1"
> -	    : "=m" (*ptep),
> -	      "=r" (r)
> -	    : "1" (pte),
> -	      "m" (*ptep));
> -	return (r);
> -}

This was the main one that was atomic with respect to multiple CPUs on
both amd64 and i386.  This seems to be accidental -- xchg to memory gives
a lock prefix and slowness whether you want it or not.

> -
> -#define	pte_load_clear(pte)	atomic_readandclear_long(pte)
> -
> -static __inline void
> -pte_store(pt_entry_t *ptep, pt_entry_t pte)
> -{
> +#define	pte_load(ptep)			atomic_load_acq_long(ptep)
> +#define	pte_load_store(ptep, pte)	atomic_swap_long(ptep, pte)
> +#define	pte_load_clear(pte)		atomic_swap_long(pte, 0)
> +#define	pte_store(ptep, pte)		atomic_store_rel_long(ptep, pte)
> +#define	pte_clear(ptep)			atomic_store_rel_long(ptep, 0)
>
> -	*ptep = pte;
> -}

pte_store() was also not atomic with respect to multiple CPUs.  So almost
everything was not atomic with respect to multiple CPUs, except for PAE
on i386.

Bruce