busdma dflt_lock on amd64 > 4 GB

Jacques Caron jc at oxado.com
Wed Oct 26 09:01:53 PDT 2005


Hi Scott,

Thanks for the input. I'm utterly lost in unknown terrain, but I'm 
trying to understand...

At 16:09 26/10/2005, Scott Long wrote:
>So, the panic is doing exactly what it is supposed to do.  It's guarding
>against bugs in the driver.  The workaround for this is to use the 
>NOWAIT flag in all instances of bus_dmamap_load() where deferals can
>happen.

As pointed out by Soren, this is not documented in man bus_dma :-/ It 
says bus_dmamap_load flags are supposed to be 0, and BUS_DMA_ALLOCNOW 
should be set at tag creation to avoid EINPROGRESS. I'm not sure the 
two would actually be equivalent, either. And from what I understand, 
even a call to bus_dma_tag_create with BUS_DMA_ALLOCNOW can be 
successful but not actually allocate what will be needed later (see below).

>   This, however, means that using bounce pages still remains 
> fragile and that the driver is still likely to return ENOMEM to the 
> upper layers.  C'est la vie, I guess.  At one time I had patches that
>made ATA use the busdma API correctly (it is one of the few remaining
>that does not), but they rotted over time.

So what would be the "correct" way? Move the part that's after the 
DMA setup in the callback? I suppose there are limitations as to what 
can happen in the callback, though, so it would complicate things quite a bit.

Obviously, a lockfunc would be needed in this situation, right?

Also, I believe many other drivers just have lots of BUS_DMA_ALLOCNOW 
or BUS_DMA_NOWAIT all over the place, I'm not sure that's the 
"correct" way, is it?

>No.  Some tags specifically should not permit deferals.

How do they do that? Setting BUS_DMA_ALLOCNOW in the tag, or 
BUS_DMA_NOWAIT in the map_load, or both, or something else? What 
should make one decide when deferrals should not be permitted? It is 
my impression that quite a few drivers happily decide they don't like 
deferrals at all whatever happens...

>Just about every other modern driver honors the API correctly.

Depends what you mean by "correctly". I'm not sure using 
BUS_DMA_NOWAIT is the right way to go as it fails if there is 
contention for bounce buffers.

>Bounce pages cannot be reclaimed to the system, so overallocating just
>wastes memory.

I'm not talking about over-allocating, but rather allocating what is 
needed: I don't understand why bus_dma_tag_create limits the total 
number of bounce pages in a bounce zone to maxsize if 
BUS_DMA_ALLOCNOW is set (which prevents bus_dmamap_create from 
allocating any further bounce pages as long as there's only one map 
per tag, which seems pretty common), while bus_dmamap_create will 
allocate maxsize additional pages if BUS_DMA_ALLOCNOW was not set.

The end result is that the ata driver is limited to 32 bounce pages 
whatever the number of instances (I guess that's channels, or 
disks?), while other drivers get hundreds of bounce pages which they 
hardly use. Maybe this is intended and it's just the way the ata 
driver uses tags and maps that is wrong, maybe it's the busdma logic 
that is wrong, I don't know...

>   The whole point of the deferal mechanism is to allow
>you to allocate enough pages for a normal load while also being able to
>handle sporadic spikes in load (like when the syncer runs) without
>trapping memory.

In this case 32 bounce pages (out of 8 GB RAM) for 6 disks seems like 
a very tight bottleneck to me.

Jacques.




More information about the freebsd-amd64 mailing list