About the memory barrier in BSD libc
John Baldwin
jhb at freebsd.org
Tue Apr 24 15:10:43 UTC 2012
On Tuesday, April 24, 2012 10:03:48 am Konstantin Belousov wrote:
> On Tue, Apr 24, 2012 at 02:43:40PM +0100, Martin Simmons wrote:
> > >>>>> On Mon, 23 Apr 2012 16:03:43 +0300, Konstantin Belousov said:
> > >
> > > On Mon, Apr 23, 2012 at 08:33:05PM +0800, Fengwei yin wrote:
> > > > On Mon, Apr 23, 2012 at 8:07 PM, Konstantin Belousov
> > > > <kostikbel at gmail.com> wrote:
> > > > > On Mon, Apr 23, 2012 at 07:44:34PM +0800, Fengwei yin wrote:
> > > > >> On Mon, Apr 23, 2012 at 7:38 PM, Slawa Olhovchenkov
<slw at zxy.spb.ru> wrote:
> > > > >> > On Mon, Apr 23, 2012 at 07:26:54PM +0800, Fengwei yin wrote:
> > > > >> >
> > > > >> >> On Mon, Apr 23, 2012 at 5:40 PM, Slawa Olhovchenkov
<slw at zxy.spb.ru> wrote:
> > > > >> >> > On Mon, Apr 23, 2012 at 05:32:24PM +0800, Fengwei yin wrote:
> > > > >> >> >
> > > > >> >> >> On Mon, Apr 23, 2012 at 4:41 PM, Slawa Olhovchenkov
<slw at zxy.spb.ru> wrote:
> > > > >> >> >> > On Mon, Apr 23, 2012 at 02:56:03PM +0800, Fengwei yin
wrote:
> > > > >> >> >> >
> > > > >> >> >> >> Hi list,
> > > > >> >> >> >> If this is not correct question on the list, please let me
know and
> > > > >> >> >> >> sorry for noise.
> > > > >> >> >> >>
> > > > >> >> >> >> I have a question regarding the BSD libc for SMP arch. I
didn't see
> > > > >> >> >> >> memory barrier used in libc.
> > > > >> >> >> >> How can we make sure it's safe on SMP arch?
> > > > >> >> >> >
> > > > >> >> >> > /usr/include/machine/atomic.h:
> > > > >> >> >> >
> > > > >> >> >> > #define mb() ?? ??__asm __volatile("lock; addl $0,(%%esp)"
: : : "memory")
> > > > >> >> >> > #define wmb() ?? __asm __volatile("lock; addl $0,(%%esp)" :
: : "memory")
> > > > >> >> >> > #define rmb() ?? __asm __volatile("lock; addl $0,(%%esp)" :
: : "memory")
> > > > >> >> >> >
> > > > >> >> >>
> > > > >> >> >> Thanks for the information. But it looks no body use it in
libc.
> > > > >> >> >
> > > > >> >> > I think no body in libc need memory barrier: libc don't work
with
> > > > >> >> > peripheral, for atomic opertions used different macros.
> > > > >> >>
> > > > >> >> If we check the usage of __sinit(), it is a typical singleton
pattern which
> > > > >> >> needs memory barrier to make sure no potential SMP issue.
> > > > >> >>
> > > > >> >> Or did I miss something here?
> > > > >> >
> > > > >> > What architecture with cache incoherency and FreeBSD support?
> > > > >>
> > > > >> I suppose it's not related with cache inchoherency (I could be
wrong).
> > > > >> It's related
> > > > >> with reorder of instruction by CPU.
> > > > >>
> > > > >> Here is the link talking about why need memory barrier for
singleton:
> > > > >> http://www.oaklib.org/docs/oak/singleton.html
> > > > >>
> > > > >> x86 has strict memory model and may not suffer this kind of issue.
But
> > > > >> ARM need to
> > > > >> take care of it IMHO.
> > > > >
> > > > > Please note that __sinit is idempotent, so double-initialization is
not
> > > > > an issue there. The only possible problematic case would be other
thread
> > > > > executing exit and not noticing non-NULL value for __cleanup while
current
> > > > > thread just set it.
> > > > >
> > > > > I am not sure how much real this race is. Each call to _sinit() is
immediately
> > > > > followed by a lock acquire, typically FLOCKFILE(), which enforces
full barrier
> > > > > semantic due to pthread_mutex_lock call. The exit() performs
__cxa_finalize()
> > > > > call before checking __cleanup value, and __cxa_finalize() itself
locks
> > > > > atexit_mutex. So the race is tiny and probably possible only for
somewhat
> > > > > buggy applications which perform exit() while there are stdio
operations
> > > > > in progress.
> > > > >
> > > > > Also note that some functions assign to __cleanup unconditionally.
> > > > >
> > > > > Do you see any real issue due to non-synchronized access to
__cleanup ?
> > > >
> > > > No. I didn't see real issue. I am just reviewing the code.
> > > >
> > > > If you don't think __sinit has issue, let's check another code:
> > > > line 68 in libc/stdio/fclose.c
> > > > line 133 in libc/stdio/findfp.c (function __sfp())
> > > >
> > > > Which is trying to free a fp slot by assign 0 to fp->_flags. But if
> > > > the instrucation
> > > > could be re-ordered, another CPU could see fp->_flags is assigned to 0
> > > > before the
> > > > cleanup from line 57 to 67.
> > > >
> > > > Let's say, if another CPU is in line 133 of __sfp(), it could see
> > > > fp->_flags become
> > > > 0 before it's aware of the cleanup (Line 57 to line 67 in
> > > > libc/stdio/fclose.c) happen.
> > > >
> > > > Note: the mutex of FUNLOCKFILE(fp) in line 69 of libc/stdio/fclose.c
> > > > just could make sure
> > > > line 70 happen after line 68. It can't impact the re-order of line 57
> > > > ~ line 68 by CPU.
> > >
> > > Yes, FUNLOCKFILE() there would have no effect on the potential CPU
reordering
> > > of the writes. But does the order of these writes matter at all ?
> > >
> > > Please note that __sfp() reinitializes all fields written by fclose().
> > > Only if CPU executing fclose() is allowed to reorder operations so that
> > > the external effect of _flags = 0 assignment can be observed before that
> > > CPU executes other operations from fclose(), there could be a problem.
> > >
> > > This is definitely impossible on Intel, and I indeed do not know about
> > > other architectures enough to reject such possibility. The _flags member
> > > is short, so atomics cannot be used there. The easier solution, if this
> > > is indeed an issue, is to lock thread_lock around _flags = 0 assignment
> > > in fclose().
> >
> > This can be a problem, even on Intel, because the compiler can reorder the
> > stores. E.g. if I compile the following with gcc -O4 on amd64:
> >
> > struct foo { int x, y; };
> >
> > int foo(struct foo *p)
> > {
> > int x = bar();
> > p->y = baz();
> > p->x = x;
> > }
> >
> > then I get the following assembly language, which sets p->x before p->y:
> >
> > movq %rdi, %rbx
> > call bar
> > movl %eax, %ebp
> > xorl %eax, %eax
> > call baz
> > movl %ebp, (%rbx)
> > movl %eax, 4(%rbx)
> >
> > __Martin
> Ok, as I already said, I think that the reordering is safe there.
>
> Anyway, the change below should remove all concerns.
>
> diff --git a/lib/libc/stdio/fclose.c b/lib/libc/stdio/fclose.c
> index f0629e8..383040e 100644
> --- a/lib/libc/stdio/fclose.c
> +++ b/lib/libc/stdio/fclose.c
> @@ -41,9 +41,12 @@ __FBSDID("$FreeBSD$");
> #include <stdio.h>
> #include <stdlib.h>
> #include "un-namespace.h"
> +#include <spinlock.h>
> #include "libc_private.h"
> #include "local.h"
>
> +extern spinlock_t __stdio_thread_lock;
> +
> int
> fclose(FILE *fp)
> {
> @@ -65,7 +68,11 @@ fclose(FILE *fp)
> FREELB(fp);
> fp->_file = -1;
> fp->_r = fp->_w = 0; /* Mess up if reaccessed. */
> + if (__isthreaded)
> + _SPINLOCK(&__stdio_thread_lock);
> fp->_flags = 0; /* Release this FILE for reuse. */
> + if (__isthreaded)
> + _SPINUNLOCK(&__stdio_thread_lock);
> FUNLOCKFILE(fp);
> return (r);
> }
> diff --git a/lib/libc/stdio/findfp.c b/lib/libc/stdio/findfp.c
> index 89c0536..bcd6f62 100644
> --- a/lib/libc/stdio/findfp.c
> +++ b/lib/libc/stdio/findfp.c
> @@ -82,9 +82,9 @@ static struct glue *lastglue = &uglue;
>
> static struct glue * moreglue(int);
>
> -static spinlock_t thread_lock = _SPINLOCK_INITIALIZER;
> -#define THREAD_LOCK() if (__isthreaded) _SPINLOCK(&thread_lock)
> -#define THREAD_UNLOCK() if (__isthreaded) _SPINUNLOCK(&thread_lock)
> +spinlock_t __stdio_thread_lock = _SPINLOCK_INITIALIZER;
> +#define THREAD_LOCK() if (__isthreaded) _SPINLOCK(&__stdio_thread_lock)
> +#define THREAD_UNLOCK() if (__isthreaded)
_SPINUNLOCK(&__stdio_thread_lock)
>
> #if NOT_YET
> #define SET_GLUE_PTR(ptr, val) atomic_set_rel_ptr(&(ptr), (uintptr_t)
(val))
Could you move the extern and THREAD_LOCK/UNLOCK macros into the stdio private
header file?
--
John Baldwin
More information about the freebsd-threads
mailing list