svn commit: r365326 - stable/12/sys/sys
Jessica Clarke
jrtc27 at freebsd.org
Fri Sep 4 13:36:34 UTC 2020
On 4 Sep 2020, at 14:09, Konstantin Belousov <kostikbel at gmail.com> wrote:
> On Fri, Sep 04, 2020 at 11:22:18AM +0000, Marcin Wojtas wrote:
>> Author: mw
>> Date: Fri Sep 4 11:22:18 2020
>> New Revision: 365326
>> URL: https://svnweb.freebsd.org/changeset/base/365326
>>
>> Log:
>> MFC: r346593
>>
>> Add barrier in buf ring peek function to prevent race in ARM and ARM64.
>>
>> Obtained from: Semihalf
>> Sponsored by: Amazon, Inc.
>>
>> Modified:
>> stable/12/sys/sys/buf_ring.h
>> Directory Properties:
>> stable/12/ (props changed)
>>
>> Modified: stable/12/sys/sys/buf_ring.h
>> ==============================================================================
>> --- stable/12/sys/sys/buf_ring.h Fri Sep 4 04:31:56 2020 (r365325)
>> +++ stable/12/sys/sys/buf_ring.h Fri Sep 4 11:22:18 2020 (r365326)
>> @@ -310,14 +310,23 @@ buf_ring_peek_clear_sc(struct buf_ring *br)
>> if (!mtx_owned(br->br_lock))
>> panic("lock not held on single consumer dequeue");
>> #endif
>> - /*
>> - * I believe it is safe to not have a memory barrier
>> - * here because we control cons and tail is worst case
>> - * a lagging indicator so we worst case we might
>> - * return NULL immediately after a buffer has been enqueued
>> - */
>> +
>> if (br->br_cons_head == br->br_prod_tail)
>> return (NULL);
>> +
>> +#if defined(__arm__) || defined(__aarch64__)
>> + /*
>> + * The barrier is required there on ARM and ARM64 to ensure, that
>> + * br->br_ring[br->br_cons_head] will not be fetched before the above
>> + * condition is checked.
>> + * Without the barrier, it is possible, that buffer will be fetched
>> + * before the enqueue will put mbuf into br, then, in the meantime, the
>> + * enqueue will update the array and the br_prod_tail, and the
>> + * conditional check will be true, so we will return previously fetched
>> + * (and invalid) buffer.
>> + */
>> + atomic_thread_fence_acq();
>> +#endif
>
> Putting the semantic of the change aside, why did you added the fence (it is
> a fence, not barrier as stated in the comment) only to arm* ? If it is
> needed, it is needed for all arches.
Agreed. The code looks fine, though I would have made it an acquire
load of br_prod_tail myself to be able to take advantage load-acquire
instructions when present, and better document what the exact issue is.
I also don't think the comment needs to be quite so extensive
(especially since atomic_load_acq_32 is somewhat self-documenting in
terms of one half of the race); if we had a comment like this for every
fence in the kernel we'd never get anything done.
There's also an ARM-specific fence in buf_ring_dequeue_sc:
> /*
> * This is a workaround to allow using buf_ring on ARM and ARM64.
> * ARM64TODO: Fix buf_ring in a generic way.
> * REMARKS: It is suspected that br_cons_head does not require
> * load_acq operation, but this change was extensively tested
> * and confirmed it's working. To be reviewed once again in
> * FreeBSD-12.
> *
> * Preventing following situation:
>
> * Core(0) - buf_ring_enqueue() Core(1) - buf_ring_dequeue_sc()
> * ----------------------------------------- ----------------------------------------------
> *
> * cons_head = br->br_cons_head;
> * atomic_cmpset_acq_32(&br->br_prod_head, ...));
> * buf = br->br_ring[cons_head]; <see <1>>
> * br->br_ring[prod_head] = buf;
> * atomic_store_rel_32(&br->br_prod_tail, ...);
> * prod_tail = br->br_prod_tail;
> * if (cons_head == prod_tail)
> * return (NULL);
> * <condition is false and code uses invalid(old) buf>`
> *
> * <1> Load (on core 1) from br->br_ring[cons_head] can be reordered (speculative readed) by CPU.
> */
> #if defined(__arm__) || defined(__aarch64__)
> cons_head = atomic_load_acq_32(&br->br_cons_head);
> #else
> cons_head = br->br_cons_head;
> #endif
> prod_tail = atomic_load_acq_32(&br->br_prod_tail);
The comment is completely correct that the ARM-specific fence is a
waste of time. It's the single-consumer path, so such fences are just
synchronising with the current thread and thus pointless. The important
one is the load-acquire of br_prod_tail, as has been discovered (sort
of) in the peek case leading to this comment, which already stops the
reordering in question.
Jess
More information about the svn-src-all
mailing list