TLS bug?
Marius Strobl
marius at alchemy.franken.de
Fri Jun 17 18:07:18 UTC 2011
On Thu, Jun 16, 2011 at 03:53:19AM -0400, Nathaniel W Filardo wrote:
> Atcht; it's late. I forgot to mention that this system is a sparc64 V240
> 2-way SMP machine. It's running a kernel from 9.0-CURRENT r222833+262af52:
> Tue Jun 7 18:47:35 EDT 2011 and a userland from a little later.
>
> Sorry about that.
> --nwf;
>
> On Thu, Jun 16, 2011 at 03:31:38AM -0400, Nathaniel W Filardo wrote:
> > I have a few applications (bonnie++ and mysql, specifically, both from
> > ports) which trip over the assertion in
> > lib/libc/stdlib/malloc.c:/^_malloc_thread_cleanup that
> > > assert(tcache != (void *)(uintptr_t)1);
> >
> > I have patched malloc.c thus:
> >
> > > --- a/lib/libc/stdlib/malloc.c
> > > +++ b/lib/libc/stdlib/malloc.c
> > > @@ -1108,7 +1108,7 @@ static __thread arena_t *arenas_map TLS_MODEL;
> > >
> > > #ifdef MALLOC_TCACHE
> > > /* Map of thread-specific caches. */
> > > -static __thread tcache_t *tcache_tls TLS_MODEL;
> > > +__thread tcache_t *tcache_tls TLS_MODEL;
> > >
> > > /*
> > > * Number of cache slots for each bin in the thread cache, or 0 if tcache
> > > * is
> > > @@ -6184,10 +6184,17 @@ _malloc_thread_cleanup(void)
> > > #ifdef MALLOC_TCACHE
> > > tcache_t *tcache = tcache_tls;
> > >
> > > + fprintf(stderr, "_m_t_c for %d:%lu with %p\n",
> > > + getpid(),
> > > + (unsigned long) _pthread_self(),
> > > + tcache);
> > > +
> > > if (tcache != NULL) {
> > > - assert(tcache != (void *)(uintptr_t)1);
> > > - tcache_destroy(tcache);
> > > - tcache_tls = (void *)(uintptr_t)1;
> > > + /* assert(tcache != (void *)(uintptr_t)1); */
> > > + if((uintptr_t)tcache != (uintptr_t)1) {
> > > + tcache_destroy(tcache);
> > > + tcache_tls = (void *)(uintptr_t)1;
> > > + }
> >
> > and libthr/thread/thr_create.c thus:
> >
> > > --- a/lib/libthr/thread/thr_create.c
> > > +++ b/lib/libthr/thread/thr_create.c
> > > @@ -243,6 +243,8 @@ create_stack(struct pthread_attr *pattr)
> > > return (ret);
> > > }
> > >
> > > +extern __thread void *tcache_tls;
> > > +
> > > static void
> > > thread_start(struct pthread *curthread)
> > > {
> > > @@ -280,6 +282,11 @@ thread_start(struct pthread *curthread)
> > > curthread->attr.stacksize_attr;
> > > #endif
> > >
> > > + fprintf(stderr, "t_s for %d:%lu with %p\n",
> > > + getpid(),
> > > + (unsigned long) _pthread_self(),
> > > + tcache_tls);
> > > +
> > > /* Run the current thread's start routine with argument: */
> > > _pthread_exit(curthread->start_routine(curthread->arg));
> > >
> >
> > to attempt to debug this issue. With those changes in place, bonnie++'s
> > execution looks like this:
> >
> > >[...]
> > > Writing a byte at a time...done
> > > Writing intelligently...done
> > > Rewriting...done
> > > Reading a byte at a time...done
> > > Reading intelligently...done
> > > t_s for 79654:1086343168 with 0x0
> > > t_s for 79654:1086345216 with 0x0
> > > t_s for 79654:1086346240 with 0x0
> > > t_s for 79654:1086347264 with 0x0
> > > t_s for 79654:1086344192 with 0x0
> > > start 'em...done...done...done...done..._m_t_c for 79654:1086344192 with
> > > 0x41404400
> > > _m_t_c for 79654:1086346240 with 0x40d2c400
> > > _m_t_c for 79654:1086343168 with 0x41404200
> > > _m_t_c for 79654:1086345216 with 0x41804200
> > > done...
> > > _m_t_c for 79654:1086347264 with 0x41004200
> > > Create files in sequential order...done.
> > > Stat files in sequential order...done.
> > > Delete files in sequential order...done.
> > > Create files in random order...done.
> > > Stat files in random order...done.
> > > Delete files in random order...done.
> > > 1.96,1.96,hydra.priv.oc.ietfng.org,1,1308217772,10M,,7,81,2644,7,3577,14,34,93,+++++,+++,773.7,61,16,,,
> > > ,,2325,74,13016,99,2342,86,3019,91,11888,99,2184,89,16397ms,1237ms,671ms,2009ms,177us,1305ms,489ms,1029
> > > us,270ms,140ms,53730us,250ms
> > > Writing a byte at a time...done
> > > Writing intelligently...done
> > > Rewriting...done
> > > Reading a byte at a time...done
> > > Reading intelligently...done
> > > t_s for 79654:1086343168 with 0x1
> > > t_s for 79654:1086346240 with 0x1
> > > t_s for 79654:1086345216 with 0x1
> > > t_s for 79654:1086347264 with 0x1
> > > t_s for 79654:1086344192 with 0x1
> > > start 'em...done...done...done...done...done...
> > > _m_t_c for 79654:1086347264 with 0x1
> > > _m_t_c for 79654:1086344192 with 0x1
> > > _m_t_c for 79654:1086343168 with 0x1
> > >[...]
> >
> > So what seems to be happening is that the TLS area is being set up
> > incorrectly, eventually: rather than zeroing the tcache_tls value, it is
> > being set to 1, which means no tcache is ever allocated, so when we get
> > around to exiting, the assert trips.
> >
> > Unfortunately, setting a breakpoint on __libc_allocate_tls seems to do bad
> > things to the kernel (inducing a SIR without any panic message). I am
> > somewhat at a loss; help?
> >
Using bonnie++ I can't reproduce this (didn't try mysql) but I have
some TLS fixes for libthr I forgot about but could be relevant here
(most actually date back to 2008 when the base binutils didn't support
GNUTLS for sparc64 so I couldn't test them easily). Could you please
give a libthr build with the following patch a try?
http://people.freebsd.org/~marius/libthr_sparc64.diff
Marius
More information about the freebsd-current
mailing list