Race condition in mb_free_ext()?
Bosko Milekic
bosko.milekic at gmail.com
Tue Mar 1 23:10:59 GMT 2005
P.S.: Not sure if this is related:
http://www.mail-archive.com/bk-commits-24@vger.kernel.org/msg00136.html
On Tue, 1 Mar 2005 18:04:27 -0500, Bosko Milekic
<bosko.milekic at gmail.com> wrote:
> On Tue, 1 Mar 2005 13:40:18 -0500, John Baldwin <jhb at freebsd.org> wrote:
> > On Monday 28 February 2005 07:04 pm, Kris Kennaway wrote:
> > > I'm seeing an easily-provoked livelock on quad-CPU sparc64 machines
> > > running RELENG_5. It's hard to get a good trace because the processes
> > > running on other CPUs cannot be traced from DDB, but I've been lucky a
> > > few times:
> > >
> > > db> show alllocks
> > > Process 15 (swi1: net) thread 0xfffff8001fb07480 (100008)
> > > exclusive sleep mutex so_snd r = 0 (0xfffff800178432a8) locked @
> > > netinet/tcp_input.c:2189 exclusive sleep mutex inp (tcpinp) r = 0
> > > (0xfffff800155c3b08) locked @ netinet/tcp_input.c:744 exclusive sleep mutex
> > > tcp r = 0 (0xc0bdf788) locked @ netinet/tcp_input.c:617 db> wh 15
> > > Tracing pid 15 tid 100008 td 0xfffff8001fb07480
> > > sab_intr() at sab_intr+0x40
> > > psycho_intr_stub() at psycho_intr_stub+0x8
> > > intr_fast() at intr_fast+0x88
> > > -- interrupt level=0xd pil=0 %o7=0xc01a0040 --
> > > mb_free_ext() at mb_free_ext+0x28
> > > sbdrop_locked() at sbdrop_locked+0x19c
> > > tcp_input() at tcp_input+0x2aa0
> > > ip_input() at ip_input+0x964
> > > netisr_processqueue() at netisr_processqueue+0x7c
> > > swi_net() at swi_net+0x120
> > > ithread_loop() at ithread_loop+0x24c
> > > fork_exit() at fork_exit+0xd4
> > > fork_trampoline() at fork_trampoline+0x8
> > > db>
> > >
> > > That code is here in mb_free_ext():
> > >
> > > /*
> > > * This is tricky. We need to make sure to decrement the
> > > * refcount in a safe way but to also clean up if we're the
> > > * last reference. This method seems to do it without race.
> > > */
> > > while (dofree == 0) {
> > > cnt = *(m->m_ext.ref_cnt);
> > > if (atomic_cmpset_int(m->m_ext.ref_cnt, cnt, cnt - 1)) {
> > > if (cnt == 1)
> > > dofree = 1;
> > > break;
> > > }
> > > }
> >
> > Well, this is obtuse at least. A simpler version would be:
> >
> > do {
> > cnt = *m->m_ext.ref_cnt;
> > } while (atomic_cmpset_int(m->m_ext.ref_cnt, cnt, cnt - 1) == 0);
> > dofree = (cnt == 1);
> >
> > --
> > John Baldwin <jhb at FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
> > "Power Users Use the Power to Serve" = http://www.FreeBSD.org
>
> Your suggestion will always enter the loop and do the atomic
> regardless of what dofree is set to above that code (not shown in
> Kris' paste):
>
> [...]
> /* Account for lazy ref count assign. */
> if (m->m_ext.ref_cnt == NULL)
> dofree = 1;
> else
> dofree = 0;
>
> /*
> * This is tricky. We need to make sure to decrement the
> * refcount in a safe way but to also clean up if we're the
> * last reference. This method seems to do it without race.
> */
> [...]
>
> The segment could still be reworked, but anyway:
>
> This does not appear to explain the livelock. What's m->m_ext.ref_cnt
> point to? And what is the value at the location pointed to by
> m->m_ext.ref_cnt? Regardless, though, the livelock itself, assuming it
> is due to a long time being spent spinning in the above loop, should
> not be caused by underruns or overruns of the reference count (those
> may only cause leaking of the cluster).
>
> Furthermore, the above code has been around in that form for some time
> now and in fact the loop was probably entered *more* often in the past
> (before the 'dofree' variable was introduced there). Since when are
> you able to cause the livelock to happen, and are you sure it is the
> mb_free_ext() that is looping indefinitely?
>
> I do not know sparc64 well, but what are the semantics of
> atomic_cmpset_int()? I see that it is defined to use the 'casa'
> instruction; does atomic_cmpset_int() behave the same way as it does
> on i386?
>
> -Bosko
>
> --
> Bosko Milekic - If I were a number, I'd be irrational.
> Contact Info: http://bmilekic.unixdaemons.com/contact.txt
>
--
Bosko Milekic - If I were a number, I'd be irrational.
Contact Info: http://bmilekic.unixdaemons.com/contact.txt
More information about the freebsd-net
mailing list