svn commit: r242014 - head/sys/kern

Wed Oct 24 19:30:45 UTC 2012

On 24.10.2012 22:16, Andre Oppermann wrote:
> On 24.10.2012 20:56, Jim Harris wrote:
>> On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd <adrian at freebsd.org>
>> wrote:
>>> On 24 October 2012 11:36, Jim Harris <jimharris at freebsd.org> wrote:
>>>
>>>>    Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle.
>>>
>>> Ok, but..
>>>
>>>
>>>>          struct mtx      tdq_lock;               /* run queue lock. */
>>>> +       char            pad[64 - sizeof(struct mtx)];
>>>
>>> .. don't we have an existing compile time macro for the cache line
>>> size, which can be used here?
>>
>> Yes, but I didn't use it for a couple of reasons:
>>
>> 1) struct tdq itself is currently using __aligned(64), so I wanted to
>> keep it consistent.
>> 2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to
>> NetBurst-based processors having 128-byte cache sectors a while back.
>> I had planned to start a separate thread on arch@ about this today on
>> whether this was still appropriate.
>
> See also the discussion on svn-src-all regarding global struct mtx
> alignment.
>
> Thank you for proving my point. ;)
>
> Let's go back and see how we can do this the sanest way.  These are
> the options I see at the moment:
>
>   1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place
>   2. use a macro like MTX_ALIGN that can be SMP/UP aware and in
>      the future possibly change to a different compiler dependent
>      align attribute
>   3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it
>      automatically gets aligned in all cases, even when dynamically
>      allocated.
>
> Personally I'm undecided between #2 and #3.  #1 is ugly.  In favor
> of #3 is that there possibly isn't any case where you'd actually
> want the mutex to share a cache line with anything else, even a data
> structure.

I'm sorry, could you hint me with some theory? I think I can agree that 
cache line sharing can be a problem in case of spin locks -- waiting 
thread will constantly try to access page modified by other CPU, that I 
guess will cause cache line writes to the RAM. But why is it so bad to 
share lock with respective data in case of non-spin locks? Won't 
benefits from free regular prefetch of the right data while grabbing 
lock compensate penalties from relatively rare collisions?

-- 
Alexander Motin