Partial cacheline flush problems on ARM and MIPS
Warner Losh
imp at bsdimp.com
Mon Aug 27 19:18:55 UTC 2012
On Aug 27, 2012, at 1:00 PM, Ian Lepore wrote:
> On Mon, 2012-08-27 at 09:53 -0700, Adrian Chadd wrote:
>> On 27 August 2012 09:08, Ian Lepore <freebsd at damnhippie.dyndns.org> wrote:
>>
>>> If two DMAs are going on concurrently in the same buffer, one is going
>>> to finish before the other, leading to a POSTxxxx sync op happening for
>>> one DMA operation while the other is still in progress. The unit of
>>> granularity for sync operations is the mapped region, so now you're
>>> syncing access to a region which still has active DMA happening within
>>> it.
>>
>> Right. But the enforced idea is "DMA up to this point should be
>> flushed to memory."
>>
>>> While I think it's really an API definition issue, think about it in
>>> terms of a potential implementation... What if the CPU had to access the
>>> memory as part of the sync for the first DMA that completes, while the
>>> second is still running? Now you've got pretty much exactly the same
>>> situation as when a driver subdivides a buffer without knowing about the
>>> cache alignment; you end up with the CPU and DMA touching data in the
>>> same cachline and no sequence of flush/invalidate can be g'teed to
>>> preserve all data correctly.
>>
>> Right. So you realise at that point you can't win and you stick each
>> of those pieces in a different cache line.
>>
>
> Actually, I think that even discussing cache lines in this context is a
> mistake (yeah, I'm the one who did so above, in trying to relate an
> abstract API design concept to a real-world hardware example).
>
> Drivers are not supposed to know about interactions between DMA
> transfers and cache lines or other machine-specific constraints; that
> info is supposed to be encapsulated and hidden within busdma. I think a
> driver making the assumption that it can do DMA safely on a buffer as
> long as that buffer is cacheline-granular is just as flawed as assuming
> that it can do DMA safely on any arbitrarily sized and aligned buffer.
>
> So the right way to "stick each of those pieces in a different cache
> line" is to allocate two different buffers, one per concurrent DMA
> transfer. Or, really, to use two separate busdma mappings would be the
> more rigorous way to say it, since the mapping is the operation at which
> constraints come into play. Thinking of it that way then drives the
> need to document that if multiple mappings describe the same area of
> physical memory, then concurrent operations on those maps yield
> unpredictable results.
Despite what I said earlier, I think this is sane. busdma only should support one DMA active at a time into a buffer. If the driver wants to do two different ones, they are on their own.
Warner
More information about the freebsd-arm
mailing list