a question about BUS_DMA_MIN_ALLOC_COMP flag meaning

Svatopluk Kraus onwahe at gmail.com
Tue Nov 24 15:59:11 UTC 2015


On Mon, Nov 23, 2015 at 2:18 PM, Svatopluk Kraus <onwahe at gmail.com> wrote:
> On Fri, Nov 20, 2015 at 3:45 PM, Konstantin Belousov
> <kostikbel at gmail.com> wrote:
>> On Wed, Nov 18, 2015 at 05:00:49PM +0100, Svatopluk Kraus wrote:
>>> Hi,
>>>
>>> I have fallen to some problem with inconsistent use of
>>> BUS_DMA_MIN_ALLOC_COMP flag. This flag was introduced in x86 MD code
>>> very very long ago and so, the problem covers all archs which came out
>>> from it.
>>>
>>> However, it's only about bus_dma_tag_t with BUS_DMA_COULD_BOUNCE flag set.
>>>
>>> (1) When bus_dma_tag_t is being created with BUS_DMA_ALLOCNOW flag
>>> specified, some bounce pages could be allocated in advance and
>>> BUS_DMA_MIN_ALLOC_COMP flag is set to the tag. The bounce pages are
>>> allocated only if the tag's maxsize property is higher than size of
>>> all bounce pages already allocated in a bounce zone.
>>>
>>> (2) When bus_dmamap_t is being created, then if BUS_DMA_MIN_ALLOC_COMP
>>> is not set on associated tag, some bounce pages are ALWAYS allocated
>>> and BUS_DMA_MIN_ALLOC_COMP is set afterwards,
>>>
>>> (3) else some bounce pages could be allocated if there is not enough
>>> pages in a bounce zone and BUS_DMA_MIN_ALLOC_COMP is set afterwards.
>>>
>>> The problem is the following. Due to case (2), the number of pages in
>>> bounce zone can grow infinitely, as bounce pages once allocated are
>>> never freed. It can happen when a big number of bus_dma_tag_t together
>>> with bus_dmamap_t are created, or they are created dynamically either
>>> because of a loadable module or by design.
>>>
>>> The inconsistency is that when bus_dma_tag_t is being created, there
>>> is no limit for how much pages could be allocated. On the other hand,
>>> when bus_dmamap_t is being created, there is MAX_BPAGES limitation.
>>>
>>> I think that fix for case (2) presented as x86 fix is the following:
>>>
>>> diff --git a/sys/x86/x86/busdma_bounce.c b/sys/x86/x86/busdma_bounce.c
>>> index 4826a2b..a15139f 100644
>>> --- a/sys/x86/x86/busdma_bounce.c
>>> +++ b/sys/x86/x86/busdma_bounce.c
>>> @@ -308,7 +308,7 @@ bounce_bus_dmamap_create(bus_dma_tag_t dmat, int
>>> flags, bus_dmamap_t *mapp)
>>>          else
>>>              maxpages = MIN(MAX_BPAGES, Maxmem -
>>>                  atop(dmat->common.lowaddr));
>>> -        if ((dmat->bounce_flags & BUS_DMA_MIN_ALLOC_COMP) == 0 ||
>>> +        if ((dmat->bounce_flags & BUS_DMA_MIN_ALLOC_COMP) == 0 &&
>>>              (bz->map_count > 0 && bz->total_bpages < maxpages)) {
>>>              pages = MAX(atop(dmat->common.maxsize), 1);
>>>              pages = MIN(maxpages - bz->total_bpages, pages);
>>>
>>>
>>> IMO, it also fixes logic by making it same as in bus_dma_tag_t case.
>> I think that this patch is correct.
>
> So, with r291142 and r291193 intermezzo, the question is: what is right?
>
> In fact, there were two possibilities:
> (1) to keep BUS_DMA_MIN_ALLOC_COMP flag and make it consistent, so
> bounce pages are allocated only once for a tag, or
> (2) to remove it, so bounce pages are allocated for every created map
> which needs them, up to some sane limit.
>
> It turned out that (1) is not good for some driver. I'm not sure why,
> but only thing I can imagine now is that a tag was created with
> BUS_DMA_ALLOCNOW flag and then, a consumer breaks tag's maxsize
> property. Or, there is another inconsitency in the if statement:
> bz->map_count > 0. Bounce zone map count is incremented later, so when
> bounce zone is without map, the test fails. Not mention that this map
> count is not atomic.
>
> Anyhow, what is right, (1) or (2) ?
>


Well, Michal (mmel) came up with an idea which seems to me most likely
now, as it explains why  bz->total_bpages are tested against
MAX_BPAGES when map is being created, but tag's maxsize is used for
evaluation of how many bounce pages should be allocated.

The idea is the following (considering that BUS_DMA_COULD_BOUNCE is
set on a tag):

If a tag has only one map, bounce pages should be allocated only once.
Either when the tag is being created and BUS_DMA_ALLOCNOW flags is set
or when its map is being created. The tag's maxsize property is used
for evaluation of how many bounce pages are needed.

If the tag has a second (or next) map(s), bz->map_count > 0 check is
valid and more bounce pages could be allocated if bz->total_bpages has
not met yet some sane limit.

Of course, the bz->map_count > 0 check limps a bit as a bounce zone
could be shared by more tags (and so their maps).

Thus, real problem is that
(1) tag's maxsize property is not limited by MAX_BPAGES,
(2) each new tag add some bounce pages to bounce zone without any limit.

So, I suggest
(1) to limit tag's maxsize property,
(2) to limit page count used by all bounce zones.

It seems to me that these two limitations should be right. The limits
should have some defaults with possibility to be changed by config or
in boot time. Any objections?


>>
>>>
>>> The next question is, if case (1) should be limited by MAX_BPAGES as
>>> in case (3) or maybe better if there should be some internal
>>> limitation for bounce zone itself.
>> Could we apply e.g. MAX_BPAGES limit to bounce zones, or should we allow
>> the limit to change based on the tag ?  But I am not sure if there is
>> any reasonable way to formulate the limit.
>
> Neither me.
>
>>
>> MAX_BPAGES looks like some arbitrary sanity limit, e.g. we could have
>> unlimited maxsize, but also have an alignment constraints, and then
>> tag requires bouncing. I am not sure that hard-coded values, esp. the
>> amd64 32MB limit, makes much sense, or that basing the limit on the
>> tag constraints makes more sense. Might be, we should allow some total
>> percentage of the physical memory on machine to be consumed by all
>> bounce zones altogether ?
>
> IMO, some global limit should be there in any case. A tunable sounds
> good, with some warning if a box owner wants too much.


More information about the freebsd-arch mailing list