busdma dflt_lock on amd64 > 4 GB
Jacques Caron
jc at oxado.com
Tue Oct 25 15:10:10 PDT 2005
Hi all,
It seems there is a continuing story about bus_dma (or rather its use
by drivers) and systems with more than 4 GB RAM. I submitted a pr for
this issue:
http://www.freebsd.org/cgi/query-pr.cgi?pr=87977
I know it happens on amd64 machines, though after looking a bit
further and trying to figure out the whole busdma thing, the issue
might be more general (as busdma_machdep.c is exactly the same for
i386 and amd64), but as it has been discussed around here a number of
times and because there are probably more amd64 systems with more
than 4 GB RAM than other types, I've selected this list, let me know
if another list would be more suitable.
What I understand (please correct me if I'm wrong) is that:
- busdma will use bounce buffers when needed, and this includes the
use of devices that are limited to 32-bit addressing (most of them, I
would guess?) when there is more than 4 GB RAM
- I'm not 100% sure, but it seems bounce buffers are a limited
ressource (that's at least what sysctl -a | grep busdma tells me, and
that really looks like a bottleneck, btw)
- apparently busdma will defer the allocation of bounce buffers when
there aren't enough available (and this can happen pretty quickly in
some situations, though I haven't yet figured out the difference
between the two zones): two simultaneous dd's from two disks with a
large block size (bs=256000) will use up all available bounce buffer
pages in zone1...
- if that happens, busdma_swi will eventually call the lockfunc
associated with the dma tag, and panic if none is defined
Now, it seems that many drivers don't provide a lockfunc to
bus_dma_tag_create. The commit log for the lockfunc addition says:
"The only time that NULL, NULL should ever be used is when the driver
ensures that bus_dmamap_load() will not be deferred."
The problem is: what does this mean? How can a driver "ensure that
bus_dmamap_load will not be deferred"? Calls to bus_dma_tag_create
are not consistent in drivers:
- some drivers are apparently cautious: twe will either have
BUS_DMA_ALLOCNOW and no lockfunc, or no flags and use
busdma_lock_mutex and Giant. Is this the right approach?
- other drivers are *very* cautious: fxp will always use
busdma_lock_mutex and Giant.
- other drivers don't care at all: bge and ata never provide a
lockfunc, and in most cases don't use any flags either.
My (humble) opinion and a few questions:
- clarification of the cases when a lockfunc is required or not is
needed. I fear it is always needed unless the created tag is only
used as a "parent" for others, or (maybe?) if BUS_DMA_ALLOCNOW is set.
- an audit of bus_dma_tag_create calls in most drivers is needed, at
least regarding lockfunc args (bge also has weird lowaddr/hiaddr, as
has already been reported)
- maybe the dflt_lock should actually use the Giant mutex by default
rather than panicking
- or maybe the lockfunc call in busdma_swi is not needed? I'm really
not versed into kernelese, so I really have no idea
- is using Giant the best option, or should each driver use a
different mutex, or...?
I will try a kernel with a modified ata driver with
busdma_lock_mutex,&Giant where needed tomorrow and report back. I
think that this will actually fix the issue, but I don't know if it
might not cause other issues or degrade performance or if there is a
better solution...
Any hints welcome,
Jacques.
More information about the freebsd-amd64
mailing list