Partial cacheline flush problems on ARM and MIPS

Ian Lepore freebsd at damnhippie.dyndns.org
Mon Aug 27 19:01:31 UTC 2012


On Mon, 2012-08-27 at 09:53 -0700, Adrian Chadd wrote:
> On 27 August 2012 09:08, Ian Lepore <freebsd at damnhippie.dyndns.org> wrote:
> 
> > If two DMAs are going on concurrently in the same buffer, one is going
> > to finish before the other, leading to a POSTxxxx sync op happening for
> > one DMA operation while the other is still in progress.  The unit of
> > granularity for sync operations is the mapped region, so now you're
> > syncing access to a region which still has active DMA happening within
> > it.
> 
> Right. But the enforced idea is "DMA up to this point should be
> flushed to memory."
> 
> > While I think it's really an API definition issue, think about it in
> > terms of a potential implementation... What if the CPU had to access the
> > memory as part of the sync for the first DMA that completes, while the
> > second is still running?  Now you've got pretty much exactly the same
> > situation as when a driver subdivides a buffer without knowing about the
> > cache alignment; you end up with the CPU and DMA touching data in the
> > same cachline and no sequence of flush/invalidate can be g'teed to
> > preserve all data correctly.
> 
> Right. So you realise at that point you can't win and you stick each
> of those pieces in a different cache line.
> 

Actually, I think that even discussing cache lines in this context is a
mistake (yeah, I'm the one who did so above, in trying to relate an
abstract API design concept to a real-world hardware example).

Drivers are not supposed to know about interactions between DMA
transfers and cache lines or other machine-specific constraints; that
info is supposed to be encapsulated and hidden within busdma.  I think a
driver making the assumption that it can do DMA safely on a buffer as
long as that buffer is cacheline-granular is just as flawed as assuming
that it can do DMA safely on any arbitrarily sized and aligned buffer.

So the right way to "stick each of those pieces in a different cache
line" is to allocate two different buffers, one per concurrent DMA
transfer.  Or, really, to use two separate busdma mappings would be the
more rigorous way to say it, since the mapping is the operation at which
constraints come into play.  Thinking of it that way then drives the
need to document that if multiple mappings describe the same area of
physical memory, then concurrent operations on those maps yield
unpredictable results.

-- Ian




More information about the freebsd-arm mailing list