Partial cacheline flush problems on ARM and MIPS
Warner Losh
imp at bsdimp.com
Sun Aug 26 23:13:36 UTC 2012
On Aug 26, 2012, at 12:25 PM, Ian Lepore wrote:
> On Sun, 2012-08-26 at 13:05 -0500, Mark Tinguely wrote:
>> I did a quick look at the drivers last summer.
>>
>> Most drivers do the right thing and use memory allocated from
>> bus_dmamem_alloc(). It is easy for us to give them a cache aligned
>> buffer.
>>
>> Some drivers use mbufs - 256 bytes which cache safe.
>>
>> Some drivers directly or indirectly malloc() a buffer and then use it
>> to dma - rather than try to fix them all, I was okay with making the
>> smallest malloc() amount equal to the cache line size. It amounts to
>> getting rid of the 16 byte allocation on some ARM architectures. The
>> power of 2 allocator will then give us cache line safe allocation.
>>
>> A few drivers take a small memory amount from the kernel stack and dma
>> to it <- broken driver.
>>
>> The few drivers that use data from a structure and that memory is not
>> cached aligned <- broken driver.
>>
>
> I disagree about those last two points -- drivers that choose to use
> stack memory or malloc'd memory as IO buffers are not broken.
Stack DMA is bad policy, at best, and broken at worst. The reason is because of alignment of the underlying unit. Since there's no way to say that something is aligned to a given spot on the stack, you are asking for random stack corruption.
Also, malloced area is similarly problematic: There's no cache line informing of the allocator, so you can wind up with an allocation of memory that's corrupted due to cache effects.
> Drivers
> can do IO directly to/from userland buffers, do we say that an
> application that calls read(2) and passes the address of a stack
> variable is broken?
Yes, if it is smaller than a cache line size, and not aligned to the cache line. That's the point of the uio load variant.
> In this regard, it's the busdma implementation that's broken, because it
> should bounce those IOs through a DMA-safe buffer. There's absolutely
> no rule that I've ever heard of in FreeBSD that says IO can only take
> place using memory allocated from busdma.
That's partially true. Since BUSDMA grew up in the storage area, you must allocate the memory from busdma, or it must be page aligned has been the de-facto rule here. The mbuf and uio variants of load were invented to cope with common cases of mbufs and user I/O to properly flag things.
How does busdma know that it is using memory that's not from its allocator?
> The rule is only that the
> proper sequence of busdma operation must be called, and beyond that it's
> up to the busdma implementation to make it work.
No. Bouncing is needed due to poor alignment of the underlying device. Not due to cache effects.
There's a limited number of things that we support with busdma. Arbitrary data from malloc that might be shared with the CPU isn't on that list.
> Our biggest problem, I think, is that we don't have a sufficient
> definition of "the proper sequence of busdma operations."
I disagree. The sequence has been known for a long time.
> I don't think it will be very hard to make the arm and mips busdma
> implementations work correctly. It won't even be too hard to make them
> fairly efficient at bouncing small IOs (my thinking is that we can make
> small bounces no more expensive than the current partial cacheline flush
> implementation which copies the data multiple times). Bouncing large IO
> will never be efficient, but the inefficiency will be a powerful
> motivator to update drivers that do large IO to work better, such as
> using buffers allocated from busdma.
I don't think the cache line problem can be solved with bounce buffers. Trying to accommodate broken drivers is what lead us to this spot. We need to fix the broken drivers. If that's impossible, then the best we can do is have the driver set a 'always bounce' flag in the tag it creates and use that to always bounce for operations through that tag.
Warner
More information about the freebsd-arm
mailing list