Partial cacheline flush problems on ARM and MIPS

Mon Aug 27 19:18:55 UTC 2012

On Aug 27, 2012, at 1:00 PM, Ian Lepore wrote:

> On Mon, 2012-08-27 at 09:53 -0700, Adrian Chadd wrote:
>> On 27 August 2012 09:08, Ian Lepore <freebsd at damnhippie.dyndns.org> wrote:
>> 
>>> If two DMAs are going on concurrently in the same buffer, one is going
>>> to finish before the other, leading to a POSTxxxx sync op happening for
>>> one DMA operation while the other is still in progress.  The unit of
>>> granularity for sync operations is the mapped region, so now you're
>>> syncing access to a region which still has active DMA happening within
>>> it.
>> 
>> Right. But the enforced idea is "DMA up to this point should be
>> flushed to memory."
>> 
>>> While I think it's really an API definition issue, think about it in
>>> terms of a potential implementation... What if the CPU had to access the
>>> memory as part of the sync for the first DMA that completes, while the
>>> second is still running?  Now you've got pretty much exactly the same
>>> situation as when a driver subdivides a buffer without knowing about the
>>> cache alignment; you end up with the CPU and DMA touching data in the
>>> same cachline and no sequence of flush/invalidate can be g'teed to
>>> preserve all data correctly.
>> 
>> Right. So you realise at that point you can't win and you stick each
>> of those pieces in a different cache line.
>> 
> 
> Actually, I think that even discussing cache lines in this context is a
> mistake (yeah, I'm the one who did so above, in trying to relate an
> abstract API design concept to a real-world hardware example).
> 
> Drivers are not supposed to know about interactions between DMA
> transfers and cache lines or other machine-specific constraints; that
> info is supposed to be encapsulated and hidden within busdma.  I think a
> driver making the assumption that it can do DMA safely on a buffer as
> long as that buffer is cacheline-granular is just as flawed as assuming
> that it can do DMA safely on any arbitrarily sized and aligned buffer.
> 
> So the right way to "stick each of those pieces in a different cache
> line" is to allocate two different buffers, one per concurrent DMA
> transfer.  Or, really, to use two separate busdma mappings would be the
> more rigorous way to say it, since the mapping is the operation at which
> constraints come into play.  Thinking of it that way then drives the
> need to document that if multiple mappings describe the same area of
> physical memory, then concurrent operations on those maps yield
> unpredictable results.

Despite what I said earlier, I think this is sane.  busdma only should support one DMA active at a time into a buffer.  If the driver wants to do two different ones, they are on their own.

Warner