svn commit: r221614 - projects/largeSMP/sys/powerpc/include

Sun May 8 14:40:43 UTC 2011

On Sun, 8 May 2011, Attilio Rao wrote:

> Log:
>  All architectures define the size-bounded types (uint32_t, uint64_t, etc.)
>  starting from base C types (int, long, etc).
>  That is also reflected when building atomic operations, as the
>  size-bounded types are built from the base C types.
>
>  However, powerpc does the inverse thing, leading to a serie of nasty

I think you mean that it does the inverse thing for atomic ops only.

>  bugs.
>  Cleanup the atomic implementation by defining as base the base C type
>  version and depending on them, appropriately.

I think that it is mostly the other arches that are backwards here,
except for plain ints.  MI code cannot use atomic ops for anything
except plain ints, since no other type is guaranteed to be supported
by the MD code.  For example, sparc64 doesn't support any 8-bit or
16-bit types, and i386 doesn't support any 64-bit types (until recently;
it now uses cmpxchg8b on some CPUs and a hack on other CPUs to support
64, bit types), and my version of i386 doesn't support any 8-bit or
16-bit types or long since these types are unusable in MI code and
unused in MD code (long is misused in some MI code but I don't use
such broken code).

At least one type must be supported, and to keep things sane, int must
be supported.  The type could be int32_t instead, but then you would have
a hard-coded assumption that int is int32_t instead of a hard-coded
assumption that int can be an atomic type.  The latter is a better
assumption for MI code.  But MD code can make the former assumption
except of course on arches where it is false, but there are none of
those yet.  This gives the following structure in atomic.h:

o basic definitions in terms of uint8/16/32/64_t.  But don't define the
   8/16/64 bit ones unless they are actually used in MD code, which is
   rare.  MI code cannot use these.
o assume that u_int is uint32_t and #define all the atomic_foo_int()
   interfaces as atomic_foo_32.  Similarly for pointers, except use
   atomic_foo_64 in some cases.
o do something for long.  MI code cannot use atomic ops on longs, but does.
   Correctly-sized longs are usually fundamentally non-atomic since they
   are twice as long as the register width and the register width is usually
   the same as the bus width and atomicness for widths wider than the bus is
   unnatural and/or inefficient.  But no supported arch has correctly-sized
   longs.

> Modified: projects/largeSMP/sys/powerpc/include/atomic.h
> ==============================================================================
> --- projects/largeSMP/sys/powerpc/include/atomic.h	Sat May  7 23:34:14 2011	(r221613)
> +++ projects/largeSMP/sys/powerpc/include/atomic.h	Sun May  8 00:39:49 2011	(r221614)
> @@ -48,13 +48,7 @@
>  * { *p += v; }
>  */
>
> -#define __ATOMIC_ADD_8(p, v, t)					\
> -    8-bit atomic_add not implemented
> -
> -#define __ATOMIC_ADD_16(p, v, t)				\
> -    16-bit atomic_add not implemented
> -

Already correctly left out except for bogusly #defining it and #undefining
it.  Now not bogusly #defined either.

> -#define __ATOMIC_ADD_32(p, v, t)				\
> +#define __atomic_add_int(p, v, t)				\
>     __asm __volatile(						\
> 	"1:	lwarx	%0, 0, %2\n"				\
> 	"	add	%0, %3, %0\n"				\
> @@ -63,10 +57,10 @@
> 	: "=&r" (t), "=m" (*p)					\
> 	: "r" (p), "r" (v), "m" (*p)				\
> 	: "cc", "memory")					\
> -    /* __ATOMIC_ADD_32 */
> +    /* __atomic_add_int */

No type problems at this level.

> #ifdef __powerpc64__
> -#define __ATOMIC_ADD_64(p, v, t)				\
> +#define __atomic_add_long(p, v, t)				\
>     __asm __volatile(						\
> 	"1:	ldarx	%0, 0, %2\n"				\
> 	"	add	%0, %3, %0\n"				\
> @@ -75,69 +69,78 @@
> 	: "=&r" (t), "=m" (*p)					\
> 	: "r" (p), "r" (v), "m" (*p)				\
> 	: "cc", "memory")					\
> -    /* __ATOMIC_ADD_64 */
> +    /* __atomic_add_long */
> #else
> -#define	__ATOMIC_ADD_64(p, v, t)				\
> -    64-bit atomic_add not implemented

Now correctly left out when it is not supported.

> +#define	__atomic_add_long(p, v, t)				\
> +    long atomic_add not implemented
> #endif

atomic_add_long() should never be supported, but it is (mis)used in
standard kernel files so not having it will break more than just LINT.
But the above is not atomic_add_long()...hacks are used below to make
that sort of work.

> +#ifdef __powerpc64__
> +_ATOMIC_ADD(long)
> +
> +#define	atomic_add_64		atomic_add_long

No problems with longs on 64-bit arches.

> +#define	atomic_add_acq_64	atomic_add_acq_long
> +#define	atomic_add_rel_64	atomic_add_rel_long
> +
> +#define	atomic_add_ptr(p, v)					\
> +	atomic_add_long((volatile u_long *)(p), (u_long)(v))
> +#define	atomic_add_acq_ptr(p, v)				\
> +	atomic_add_acq_long((volatile u_long *)(p), (u_long)(v))
> +#define	atomic_add_rel_ptr(p, v)				\
> +	atomic_add_rel_long((volatile u_long *)(p), (u_long)(v))

This uses bogus casts to break warnings.  amd64 is still missing this bug.
Someone added this bug to i386 after reviewers objected to it.  Even i386
only has this bug for accesses to pointers.  For char/short/long, it uses
type-safe code with separate functions/#defines like the ones for int.

The casts for pointers break jhb's intentions that:
- atomic accesses to pointers should be rare
- when they are needed, the caller must cast

I don't know why we don't have separate functions/#defines for pointers,
but like not having the bloat for it.

> +#else
> +#define	atomic_add_long(p, v)					\
> +	atomic_add_int((volatile u_int *)(p), (u_int)(v))
> +#define	atomic_add_acq_long(p, v)				\
> +	atomic_add_acq_int((volatile u_int *)(p), (u_int)(v))
> +#define	atomic_add_rel_long(p, v)				\
> +	atomic_add_rel_int((volatile u_int *)(p), (u_int)(v))
> +
> +#define	atomic_add_ptr(p, v)					\
> +	atomic_add_int((volatile u_int *)(p), (u_int)(v))
> +#define	atomic_add_acq_ptr(p, v)				\
> +	atomic_add_acq_int((volatile u_int *)(p), (u_int)(v))
> +#define	atomic_add_rel_ptr(p, v)				\
> +	atomic_add_rel_int((volatile u_int *)(p), (u_int)(v))
> +#endif

Now you need the bogus casts for the longs, since although atomic
accesses to longs should be rarer than ones to pointers (never on MI code),
it is not intended that the caller must bogusly cast for them, especially
since the bogus casts are MD.

The bogus casts for the pointers should have no effect except to break
warnings in new code, since at least old MI code must already have the
casts in callers else it wouldn't compile on amd64.

> [... often I snip bits without noting this as here]
> -#if 0
> -_ATOMIC_CLEAR(8, 8, uint8_t)
> -_ATOMIC_CLEAR(8, char, u_char)
> -_ATOMIC_CLEAR(16, 16, uint16_t)
> -_ATOMIC_CLEAR(16, short, u_short)
> -#endif
> -_ATOMIC_CLEAR(32, 32, uint32_t)
> -_ATOMIC_CLEAR(32, int, u_int)
> -#ifdef __powerpc64__
> -_ATOMIC_CLEAR(64, 64, uint64_t)
> -_ATOMIC_CLEAR(64, long, u_long)
> -_ATOMIC_CLEAR(64, ptr, uintptr_t)
> -#else
> -_ATOMIC_CLEAR(32, long, u_long)
> -_ATOMIC_CLEAR(32, ptr, uintptr_t)
> -#endif

Seems a better way, and still used on amd64 and i386, to generate
separate type-safe code for each type supported.  Here the only error
relative to amd64 and i386 seems to have been to generate from long
to a 32/64 suffix and #define from that back to a long suffix, instead
of to generate from long to a long suffix and #define from that to a
32/64 suffix.

> ...
> @@ -622,39 +638,59 @@ atomic_cmpset_rel_long(volatile u_long *
> 	return (atomic_cmpset_long(p, cmpval, newval));
> }
>
> -#define	atomic_cmpset_acq_int	atomic_cmpset_acq_32
> -#define	atomic_cmpset_rel_int	atomic_cmpset_rel_32
> -
> -#ifdef __powerpc64__
> -#define	atomic_cmpset_acq_ptr(dst, old, new)	\
> -    atomic_cmpset_acq_long((volatile u_long *)(dst), (u_long)(old), (u_long)(new))
> -#define	atomic_cmpset_rel_ptr(dst, old, new)	\
> -    atomic_cmpset_rel_long((volatile u_long *)(dst), (u_long)(old), (u_long)(new))
> -#else
> -#define	atomic_cmpset_acq_ptr(dst, old, new)	\
> -    atomic_cmpset_acq_32((volatile u_int *)(dst), (u_int)(old), (u_int)(new))
> -#define	atomic_cmpset_rel_ptr(dst, old, new)	\
> -    atomic_cmpset_rel_32((volatile u_int *)(dst), (u_int)(old), (u_int)(new))
> +#define	atomic_cmpset_64	atomic_cmpset_long
> +#define	atomic_cmpset_acq_64	atomic_cmpset_acq_long
> +#define	atomic_cmpset_rel_64	atomic_cmpset_rel_long
> +
> +#define	atomic_cmpset_ptr(dst, old, new)				\

No bogus casts for these now.

> +	atomic_cmpset_long((volatile u_long *)(dst), (u_long)(old),	\
> +	    (u_long)(new))
> +#define	atomic_cmpset_acq_ptr(dst, old, new)				\
> +	atomic_cmpset_acq_long((volatile u_long *)(dst), (u_long)(old),	\
> +	    (u_long)(new))
> +#define	atomic_cmpset_rel_ptr(dst, old, new)				\
> +	atomic_cmpset_rel_long((volatile u_long *)(dst), (u_long)(old),	\
> +	    (u_long)(new))

These bogus casts seem more bogus than before -- they seem to do nothing
except break type safety, since you already have functions with the right
suffixes and types (except the suffix says `long' but only u_longs are
supported).

> +#else
> +#define	atomic_cmpset_long(dst, old, new)				\
> +	atomic_cmpset_int((volatile u_int *)(dst), (u_int)(old),	\
> +	    (u_int)(new))
> +#define	atomic_cmpset_acq_long(dst, old, new)				\
> +	atomic_cmpset_acq_int((volatile u_int *)(dst), (u_int)(old),	\
> +	    (u_int)(new))
> +#define	atomic_cmpset_rel_long(dst, old, new)				\
> +	atomic_cmpset_rel_int((volatile u_int *)(dst), (u_int)(old),	\
> +	    (u_int)(new))

Usual bogus casts to type-pun from long to int in the 32-bit case.  Even
here, the ones on the non-pointer args have no useful effect.

> +
> +#define	atomic_cmpset_ptr(dst, old, new)				\
> +	atomic_cmpset_int((volatile u_int *)(dst), (u_int)(old),	\
> +	    (u_int)(new))
> +#define	atomic_cmpset_acq_ptr(dst, old, new)				\
> +	atomic_cmpset_acq_int((volatile u_int *)(dst), (u_int)(old),	\
> +	    (u_int)(new))
> +#define	atomic_cmpset_rel_ptr(dst, old, new)				\
> +	atomic_cmpset_rel_int((volatile u_int *)(dst), (u_int)(old),	\
> +	    (u_int)(new))
> #endif

Usual bogus casts for the pointer case.

> #ifdef __powerpc64__
> -static __inline uint64_t
> -atomic_fetchadd_64(volatile uint64_t *p, uint64_t v)
> +static __inline u_long
> +atomic_fetchadd_long(volatile u_long *p, u_long v)
> {
> -	uint64_t value;
> +	u_long value;
>
> 	do {
> 		value = *p;
> @@ -662,10 +698,10 @@ atomic_fetchadd_64(volatile uint64_t *p,
> 	return (value);
> }
>
> -#define	atomic_fetchadd_long	atomic_fetchadd_64
> +#define	atomic_fetchadd_64	atomic_fetchadd_long
> #else
> -#define	atomic_fetchadd_long(p, v)	\
> -    (u_long)atomic_fetchadd_32((volatile u_int *)(p), (u_int)(v))
> +#define	atomic_fetchadd_long(p, v)					\
> +	(u_long)atomic_fetchadd_int((volatile u_int *)(p), (u_int)(v))

i386 uses a function to type-pun from fetchadd_long to fetchadd_int().
This has minor technical advantages.  Why be different?

> #endif
>
> #endif /* ! _MACHINE_ATOMIC_H_ */

This file is disgustingly large.  Sizes in a few-months-old sources:

%      483    1887   16123 amd64/include/atomic.h
%      388    1257   10623 arm/include/atomic.h
%      487    1897   16033 i386/include/atomic.h
%      389    1383   12377 ia64/include/atomic.h
%      633    2333   18697 mips/include/atomic.h
%        6      22     154 pc98/include/atomic.h
%      671    2259   17306 powerpc/include/atomic.h
%      299    1346    8953 sparc64/include/atomic.h
%      299    1346    8950 sun4v/include/atomic.h

sparc64 is cleanest.  i386 atomic.h is about 150 lines smaller after
removing unused mistakes.  The extra code for powerpc64/32 ifdefs
shouldn't take another 200 lines.

I checked for type errors related to fetchadd.  Most arches define 3
or 4 fetchadd functions.  Only atomic_fetchadd_long is actually used.
All of its uses are in MI code with bogus types:

% ./kern/kern_resource.c:		if (atomic_fetchadd_long(&uip->ui_proccnt, (long)diff) + diff > max) {
% ./kern/kern_resource.c:		if (atomic_fetchadd_long(&uip->ui_sbsize, (long)diff) + diff > max) {
% ./kern/kern_resource.c:		if (atomic_fetchadd_long(&uip->ui_ptscnt, (long)diff) + diff > max) {

The ui_ fields are only long because 4.4BSD had too many longs and
these fields didn't cause any ABI problems so they were left as longs.
The pure counters here never needed to be more than 32-bits, so ints
would have suffixed for them since we always assumed that ints have
32 bits.  ui_sbsize just might want to grow larger than 2**31 on a
64-bit machine, but limiting it to 2**31 is reasonable (the ulimit for
it defaults to unlimited (2**63-1), but serious memory shortages would
develop if you allowed anyone to actually use as much as 2**31 for
just socket buffers).  The similar but should-be-less-restrictive limit
on pipekva only grows from 16MB on i386 to 128MB on amd64.

Bruce