On-stack allocation of DMA S/G lists

Tue Aug 7 18:09:41 UTC 2012

On Tuesday, August 07, 2012 12:09:42 pm Ian Lepore wrote:
> On Mon, 2012-08-06 at 10:26 -0400, John Baldwin wrote:
> > On Thursday, July 12, 2012 8:26:05 am John Baldwin wrote:
> > > On Sunday, July 08, 2012 7:05:16 am Peter Jeremy wrote:
> > > > BTW(2): Whilst studying busdma_machdep.c for arm and mips, I've
> > > > noticed they appear to potentially allocate substantial kernel stack
> > > > under some conditions as several bus_dma(9) functions include:
> > > >     bus_dma_segment_t dm_segments[dmat->nsegments];
> > > > What prevents this overflowing the kernel stack?
> > > 
> > > That does seem dubious.  x86 stores the array in the tag instead.
> > 
> > I have an untested patch to change bus-dma on arm and mips to allocate a
> > dynamic S/G list in each DMA tag on first use instead of using on-stack
> > allocation (which I think is rather bogus).  Can folks review and test this 
> > patch please?  Thanks.
> > 
> > http://www.FreeBSD.org/~jhb/patches/arm_mips_dynamic_dma_segs.patch
> > 
> 
> I'm worried about changing a per-mapping-call resource to a per-dma-tag
> resource here.  What prevents the situation where you have two
> bus_dmamap_load() calls in progress at the same time using different
> buffers but the same tag?
> 
> I can't find anything in the docs that indicates you have to provide
> external locking of the tag for map load/unload calls, or that even
> implies the tag can be modified by a mapping operation.  The lockfunc
> stuff related to creating the tag is documented as being used only
> during a deferred callback.

Actually, I do think it is implicit that you won't do concurrent loads
on a DMA tag, though that may not be obvious.  Keep in mind that this
is what x86's bus_dma has always done.  For storage drivers you certainly
can't do this or risk completeing I/O requests out-of-order which can
break an upper-layer assumption in a filesystem.  Note that all other
platforms do this as well, only arm and mips allocate on the stack.

> The existing code seems to go out of its way to avoid modifying the tag
> during a mapping operation.  For example, it decides at tag creation
> time whether any bounce pages might ever be needed for the tag, and if
> so it pre-sets a bounce zone in the tag, then at mapping time the bounce
> zone is protected with its own lock when it gets modified.  To me this
> feels like a way to specifically avoid the need to lock or modify the
> tag during a mapping operation.
> 
> Assuming that all of the foregoing is moot for some reason I've
> overlooked, then on a purely implementation level, could all the
> duplicated code to allocate the array when necessary be moved into
> bus_dmamap_load_buffer(), triggered by a NULL 'segs' pointer?

Nope, bus_dmamap_load() doesn't know which of M_NOWAIT / M_WAITOK is
appropriate to use.

> And just for the record, looking at the problem from an even more
> distant vantage... is there really a problem with stack-allocating the
> segments?  On a 64-bit arch the struct is like 16 bytes.  Typical usage
> is to allocate a tag allowing 1 or just a few segments.  Is anyone
> really going to create a tag specifying hundreds of segments that would
> overflow the stack?  If they try, wouldn't failing the tag create be
> good enough?

I/O devices can allocate tags with several S/G elements.  An mfi(4) tag
on i386 would use a 256 byte segments array (512 on amd64).  That's not
entirely trivial.  It would be worse if you couldn't depend on
dmat->nsegments and had to always allocate the full size.  Presumably
though we require C99 at that point (and it requires that?).

-- 
John Baldwin