cb_dumpdata vs. PowerMac G5's (or at least my SSD context): add involvement of dumpinfo's maxiosize value?

Mark Millard markmi at dsl-only.net
Mon Mar 2 00:47:24 UTC 2015


Unfortunately reporting the trace of finding what is happening for the DMA size panic-dump issue for powerpc64 on PowerMac G5's (and others FreeBSDs?) is rather long...


My test of trying a variant of my example code still ended up trying for a 64K DMA when only 32K is allowed according to the error message it generated.

It would appear the dumperinfo's maxiosize is too big as well. Looking around...

$ find sys -name '*.[hcsm]' -exec grep "\<oversized DMA\>" {} \; -print | more
                      "FAILURE - oversized DMA transfer attempt %d > %d\n",
sys/dev/ata/ata-dma.c
                    "FAILURE - oversized DMA transfer attempt %d > %d\n",
sys/powerpc/powermac/ata_dbdma.c

static int
ata_dmaload(struct ata_request *request, void *addr, int *entries)
{
    struct ata_channel *ch = device_get_softc(request->parent);
...
    if (request->bytecount > ch->dma.max_iosize) {
        device_printf(request->parent,
                      "FAILURE - oversized DMA transfer attempt %d > %d\n",
                      request->bytecount, ch->dma.max_iosize);
        return EIO;
    }
...
}

and

static int
ata_dbdma_load(struct ata_request *request, void *addr, int *entries)
{
        struct ata_channel *ch = device_get_softc(request->parent);
...
        if (request->bytecount > ch->dma.max_iosize) {
                device_printf(request->dev,
                    "FAILURE - oversized DMA transfer attempt %d > %d\n",
                    request->bytecount, ch->dma.max_iosize);
                return EIO;
        }
...
}

These suggest that the dumperinfo's maxiosize might sometimes be legitimately bigger than ata_channel's dma.max_iosize since dma.max_iosize would be very specific to DMA transfers.

So it would appear that either:

A) ata_channel's dma.max_iosize should also be considered in dump_machdep.c .
or
B) dumperinfo's maxiosize value should in part be based on ata_channel's dma.max_iosize when an ata_channel is involved and DMA would be involved.

Then looking...

$ find sys -name '*.[hcSm]' -exec grep "\<maxiosize\>" {} \; -print | more                                                                   
        maxdumpsz = di->maxiosize;
sys/mips/mips/minidump_machdep.c
        maxdumppgs = min(di->maxiosize / PAGE_SIZE, MAXDUMPPGS);
sys/x86/x86/dump_machdep.c
        gkd->di.maxiosize = DFLTPHYS;
sys/geom/raid/g_raid.c
        gkd->di.maxiosize = dp->d_maxsize;
sys/geom/geom_disk.c
        maxdumpsz = min(di->maxiosize, MAXDUMPPGS * PAGE_SIZE);
sys/i386/i386/minidump_machdep.c
        maxdumpsz = min(di->maxiosize, MAXDUMPPGS * PAGE_SIZE);
sys/amd64/amd64/minidump_machdep.c
        u_int   maxiosize;      /* Max size allowed for an individual I/O */
sys/sys/conf.h
        maxdumpsz = di->maxiosize;
sys/arm/arm/minidump_machdep.c
        u_int max_trans_sz = (0==di) ? 0 : (di->maxiosize<DFLTPHYS) ? di->maxiosize : DFLTPHYS;
                                 * The original code that ignored di->maxiosize had that property
sys/powerpc/powerpc/dump_machdep.c

static void
g_disk_kerneldump(struct bio *bp, struct disk *dp)
{
        struct g_kerneldump *gkd;
        struct g_geom *gp;

        gkd = (struct g_kerneldump*)bp->bio_data;
...
        gkd->di.maxiosize = dp->d_maxsize;
...
}

So that would trace back to disk's d_maxsize. The only assignment to the field that I find is in g_disk_access:

g_disk_access(struct g_provider *pp, int r, int w, int e)
{
        struct disk *dp;
        struct g_disk_softc *sc;
        int error;

        g_trace(G_T_ACCESS, "g_disk_access(%s, %d, %d, %d)",
            pp->name, r, w, e);
        g_topology_assert();
        sc = pp->private;
        if (sc == NULL || (dp = sc->dp) == NULL || dp->d_destroyed) {
...
                return (ENXIO);
        }
...
       if ((pp->acr + pp->acw + pp->ace) == 0 && (r + w + e) > 0) {
...
                if (dp->d_maxsize == 0) {
                        printf("WARNING: Disk drive %s%d has no d_maxsize\n",
                            dp->d_name, dp->d_unit);
                        dp->d_maxsize = DFLTPHYS;
                }
...
}

So it appears that the caller(s) of disk_create need to supply the value to the field. Looking at sys/cam/ata/ata_da.c's calling code...

#ifndef ata_disk_firmware_geom_adjust
#define ata_disk_firmware_geom_adjust(disk)
#endif
...
static cam_status
adaregister(struct cam_periph *periph, void *arg)
{
        struct ada_softc *softc;
        struct ccb_pathinq cpi;
        struct ccb_getdev *cgd;
        char   announce_buf[80], buf1[32];
        struct disk_params *dp;
        caddr_t match;
        u_int maxio;
        int legacy_id, quirks;
...
        bzero(&cpi, sizeof(cpi));
        xpt_setup_ccb(&cpi.ccb_h, periph->path, CAM_PRIORITY_NONE);
        cpi.ccb_h.func_code = XPT_PATH_INQ;
        xpt_action((union ccb *)&cpi);
...
        maxio = cpi.maxio;              /* Honor max I/O size of SIM */
        if (maxio == 0)
                maxio = DFLTPHYS;       /* traditional default */
        else if (maxio > MAXPHYS)
                maxio = MAXPHYS;        /* for safety */
        if (softc->flags & ADA_FLAG_CAN_48BIT)
                maxio = min(maxio, 65536 * softc->params.secsize);
        else                                    /* 28bit ATA command limit */
                maxio = min(maxio, 256 * softc->params.secsize);
        softc->disk->d_maxsize = maxio;
...
        ata_disk_firmware_geom_adjust(softc->disk);
...
}

So far I do not see any hint of a ata_channel's dma.max_iosize being involved in setting d_maxsize. (Only pc98 and sparc64 seem to have ata_disk_firmware_geom_adjust definitions.)

As overall there may not be a global requirement to use DMA this may well make sense.

But it would mean that when DMA is to be used and it is an ata_channel context that the ata_channel's dma.max_iosize should be consulted. (Not that ata_channel need be the only context with such an issue. But it is the one involved in my problem.)

Looking around I find the following assignments:

$ find sys -name '*.[hcSm]' -exec grep "\<max_iosize\>" {} \; -print | more                                                                      
    ch->dma.max_iosize = (ATA_SIIPRB_DMA_ENTRIES - 1) * PAGE_SIZE;
sys/dev/ata/chipsets/ata-siliconimage.c
    ch->dma.max_iosize = 65536;
sys/dev/ata/chipsets/ata-promise.c
    ch->dma.max_iosize = 64 * DEV_BSIZE;
sys/dev/ata/chipsets/ata-serverworks.c
    ch->dma.max_iosize = (ATA_AHCI_DMA_ENTRIES - 1) * PAGE_SIZE;
sys/dev/ata/chipsets/ata-ahci.c
        ch->dma.max_iosize = 64 * DEV_BSIZE;
sys/dev/ata/chipsets/ata-cyrix.c
        ch->dma.max_iosize = 64 * DEV_BSIZE;
sys/dev/ata/chipsets/ata-national.c
        if (ch->dma.max_iosize > 256 * 512)
                ch->dma.max_iosize = 256 * 512;
sys/dev/ata/chipsets/ata-acerlabs.c
        ch->dma.max_iosize = 64 * DEV_BSIZE;
sys/dev/ata/chipsets/ata-marvell.c
... (no assignment) ...
sys/dev/ata/ata-all.c
... (no assignment) ...
sys/dev/ata/ata-all.h
    if (ch->dma.max_iosize == 0)
        ch->dma.max_iosize = MIN((ATA_DMA_ENTRIES - 1) * PAGE_SIZE, MAXPHYS);
                           NULL, NULL, ch->dma.max_iosize,
                               NULL, NULL, ch->dma.max_iosize,
    if (request->bytecount > ch->dma.max_iosize) {
                      request->bytecount, ch->dma.max_iosize);
sys/dev/ata/ata-dma.c
... (no assignment) ...
sys/arm/mv/mv_sata.c
... (no assignment) ...
sys/powerpc/powermac/ata_dbdma.c
... (no assignment) ...
sys/cam/ctl/ctl_backend_block.c

All but possibly the last being ata contexts for the field name max_iosize .

In sys/sys/param.h I find:

#define DEV_BSHIFT      9               /* log2(DEV_BSIZE) */
#define DEV_BSIZE       (1<<DEV_BSHIFT)

So DEV_BSIZE == 512. That means several of the above assignment values are less than 64*1024 (a.k.a. DFLTPHYS), some being 64*512 (so 32*1024) like the PowerMac G5 is ending up with for my configuration. There is also:

$ find sys -name '*.[h]' -exec grep "\<ATA_.*_ENTRIES\>" {} \; -print | more                                                                     
#define ATA_AHCI_DMA_ENTRIES            129
    struct ata_ahci_dma_prd     prd_tab[ATA_AHCI_DMA_ENTRIES];
#define ATA_DMA_ENTRIES                 256
sys/dev/ata/ata-all.h

and sys/powerpc/include/param.h has:

#define PAGE_SHIFT      12
#define PAGE_SIZE       (1L << PAGE_SHIFT)      /* Page size */
#define PAGE_MASK       (vm_offset_t)(PAGE_SIZE - 1)

So PAGE_SIZE == 4096 for the context. So any ATA_..._ENTRIES * PAGE_SIZE alternatives are not involved in my problem.


Looking at what dumper is assigned...

static void
g_disk_kerneldump(struct bio *bp, struct disk *dp)
{
        struct g_kerneldump *gkd;
        struct g_geom *gp;
...
        if (dp->d_dump == NULL) {
                g_io_deliver(bp, ENODEV);
                return;
        }
        gkd->di.dumper = dp->d_dump;
...
}

where sys/cam/ata/ata_da.c has:

static cam_status
adaregister(struct cam_periph *periph, void *arg)
{
        struct ada_softc *softc;
...
        softc->disk->d_dump = adadump;
...
}

adadump has no logic to limit its I/O requests by the dma.max_iosize when given a length bigger than that. sys/cam/ata/ata_da.c has no text "dma" at all. But adadump does have:

adadump(void *arg, void *virtual, vm_offset_t physical, off_t offset, size_t length)
{
...
                if ((softc->flags & ADA_FLAG_CAN_48BIT) &&
                    (lba + count >= ATA_MAX_28BIT_LBA ||
                    count >= 256)) {
                        ata_48bit_cmd(&ccb.ataio, ATA_WRITE_DMA48,
                            0, lba, count);
                } else {
                        ata_28bit_cmd(&ccb.ataio, ATA_WRITE_DMA,
                            0, lba, count);
                }
...
}

which makes the I/O a DMA based I/O for sure.


I do not see that sys/cam/ata/... explicitly uses or exposes ata_channel's dma.max_iosize in any way.

As a matter of interfacing I would expect that --just like the dumper field gets into the appropriate cam code-- any abstraction spanning dma.max_iosize sorts of issues would also reference something that gets into the cam code that matches up with the dumper cam code.

There seems to be no such interface. One could imagine a new dumpinfo field (look for the dumper_adjusted_maxsize below):

static int
cb_dumpdata(struct pmap_md *md, int seqnr, void *arg)
{
       struct dumperinfo *di = (struct dumperinfo*)arg;
       u_int max_trans_sz = (di->maxiosize<DFLTPHYS) ? di->maxiosize : DFLTPHYS;
...
       max_trans_sz = di->dumper_adjusted_maxsize(di->priv, max_trans_sz);
...
       printf("  chunk %d: %lu bytes ", seqnr, (u_long)resid);

       while (resid) {
               sz = (resid > max_trans_sz) ? max_trans_sz : resid;
...
               error = di->dumper(di->priv, (void*)va, 0, dumplo, sz);
...
}

(Middle layers would need to be set up to allow plugging in the value into dumpinfo's new field and sys/cam/ata/... would need ata_channel's dma.max_iosize access and use of it.)

In many cases dumper_adjusted_maxsize might be an identity operation (no adjustment) or a simple maximum operation but for sys/cam/ata/ata_da.c contexts its main purpose would be to apply DMA size limitations if they are smaller than the parameter passed in.



Out of all this my overall conclusion is that there is a hole in the FreeBSD design and without some design changes the technique is to change the constant values used in cb_dumpdata to values that happen to work.

In this simpler line of things it leaves me wondering if something like DFLTPHYS/2 would be appropriate for dumper contexts in official 10.1-STABLE FreeBSD for now:

static int
cb_dumpdata(struct pmap_md *md, int seqnr, void *arg)
{
       struct dumperinfo *di = (struct dumperinfo*)arg;
       u_int max_trans_sz = (di->maxiosize<DFLTPHYS/2) ? di->maxiosize : DFLTPHYS/2;
...
       printf("  chunk %d: %lu bytes ", seqnr, (u_long)resid);

       while (resid) {
               sz = (resid > max_trans_sz) ? max_trans_sz : resid;
...
               error = di->dumper(di->priv, (void*)va, 0, dumplo, sz);
...
}

More of the existing ch->dma.max_iosize assignments would not be exceeded and so more contexts could then generate panic dumps.




Context for the experiment that prompted the above looking around:

$ freebsd-version -ku; uname -a
10.1-RELEASE-p6
10.1-STABLE
FreeBSD FBSDG5M1 10.1-RELEASE-p6 FreeBSD 10.1-RELEASE-p6 #21 r279264M: Sun Mar  1 03:31:59 PST 2015     root at FBSDG5M1:/usr/obj/usr/home/markmi/src_10_1_releng/sys/GENERIC64vtsc  powerpc

$ svnlite st
?       .snap
M       sys/ddb/db_main.c
M       sys/ddb/db_script.c
 M      sys/powerpc/conf
?       sys/powerpc/conf/GENERIC64vtsc
M       sys/powerpc/ofw/ofw_machdep.c
M       sys/powerpc/ofw/ofwcall64.S
M       sys/powerpc/powerpc/dump_machdep.c

$ svnlite info
Path: .
Working Copy Root Path: /usr/src
URL: https://svn0.us-west.freebsd.org/base/stable/10
Relative URL: ^/stable/10
Repository Root: https://svn0.us-west.freebsd.org/base
Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Revision: 279201
Node Kind: directory
Schedule: normal
Last Changed Author: pluknet
Last Changed Rev: 279201
Last Changed Date: 2015-02-23 00:45:42 -0800 (Mon, 23 Feb 2015)

$ svnlite diff sys/powerpc/powerpc/dump_machdep.c | more
Index: sys/powerpc/powerpc/dump_machdep.c
===================================================================
--- sys/powerpc/powerpc/dump_machdep.c  (revision 279201)
+++ sys/powerpc/powerpc/dump_machdep.c  (working copy)
@@ -113,6 +113,11 @@
 cb_dumpdata(struct pmap_md *md, int seqnr, void *arg)
 {
        struct dumperinfo *di = (struct dumperinfo*)arg;
+       u_int max_trans_sz = (0==di) ? 0 : (di->maxiosize<DFLTPHYS) ? di->maxiosize : DFLTPHYS;
+                               /* If md->md_size turns out to be 0 the above avoids derefercing di.
+                                * The original code that ignored di->maxiosize had that property
+                                * overall. So this update does too.
+                                */
        vm_offset_t va;
        size_t counter, ofs, resid, sz;
        int c, error, twiddle;
@@ -124,10 +129,10 @@
        ofs = 0;        /* Logical offset within the chunk */
        resid = md->md_size;
 
-       printf("  chunk %d: %lu bytes ", seqnr, (u_long)resid);
+       printf("  chunk %d: %lu bytes max_trans_sz %lu ", seqnr, (u_long)resid, (u_long)max_trans_sz);
 
        while (resid) {
-               sz = (resid > DFLTPHYS) ? DFLTPHYS : resid;
+               sz = (resid > max_trans_sz) ? max_trans_sz : resid;
                va = pmap_dumpsys_map(md, ofs, &sz);
                counter += sz;
                if (counter >> 24) {


===
Mark Millard
markmi at dsl-only.net

On 2015-Mar-1, at 04:14 AM, Mark Millard <markmi at dsl-only.net> wrote:

Context: 10.1-STABLE powerpc64 for PowerMac G5

I'm unable to get panic dumps because the DMA size is rejected, 64K being too big. I'd like to get things to the point where panic dumps are possible for that context.

Looking around I find:

root at FBSDG5M1:/usr/src # grep DFLTPHYS sys/sys/*.h
sys/sys/param.h:#ifndef DFLTPHYS
sys/sys/param.h:#define DFLTPHYS	(64 * 1024)	/* default max raw I/O transfer size */
sys/sys/param.h:#define MAXDUMPPGS	(DFLTPHYS/PAGE_SIZE)

root at FBSDG5M1:/usr/src # grep DFLTPHYS sys/powerpc/powerpc/*
sys/powerpc/powerpc/dump_machdep.c:		sz = (resid > DFLTPHYS) ? DFLTPHYS : resid;

static int
cb_dumpdata(struct pmap_md *md, int seqnr, void *arg)
{
       struct dumperinfo *di = (struct dumperinfo*)arg;
...
       resid = md->md_size;
...
       while (resid) {
               sz = (resid > DFLTPHYS) ? DFLTPHYS : resid;
...
               error = di->dumper(di->priv, (void*)va, 0, dumplo, sz);
...
}

Which may well be where the 64K is from. sys/sys/conf.h has:

struct dumperinfo {
       dumper_t *dumper;       /* Dumping function. */
       void    *priv;          /* Private parts. */
       u_int   blocksize;      /* Size of block in bytes. */
       u_int   maxiosize;      /* Max size allowed for an individual I/O */
       off_t   mediaoffset;    /* Initial offset in bytes. */
       off_t   mediasize;      /* Space available in bytes. */
};

So it would appear that maxiosize from dumperinfo was supposed to be involved. But at this level nothing is checking dumperinfo's maxiosize field value to avoid using anything bigger.

It may be that the di->dumper is supposed to deal with allowed sizes being smaller than its size parameter indicates (sz here). But then exposing maxiosize in dumperinfo would seem a bit odd as an interface.

My initial guess would be that cb_dumpdata should be more like:

static int
cb_dumpdata(struct pmap_md *md, int seqnr, void *arg)
{
       struct dumperinfo *di = (struct dumperinfo*)arg;
       u_int max_trans_sz = (di->maxiosize<DFLTPHYS) ? di->maxiosize : DFLTPHYS;
...
       printf("  chunk %d: %lu bytes ", seqnr, (u_long)resid);

       while (resid) {
               sz = (resid > max_trans_sz) ? max_trans_sz : resid;
...
               error = di->dumper(di->priv, (void*)va, 0, dumplo, sz);
...
}

I have assumed that even when 0==resid that 0!=arg: the origin code would not dereference di in that kind of context so a little more conditional logic might be required if that property needs to be preserved.

Anyone know if I seem to be going in a reasonable direction for this?



I've only noted the large transfer size that I found. Many other places use a di->dumper with the size of something much smaller or with a smaller symbolic constant (such as having value 512). Again there is no maxiosize handling. It seems there there is an expected (implicit?) minimum size for maxiosize such that these various things would always fit. From that point of view I'm just objecting to DFLTPHYS being that minimum I guess: I want a smaller minimum so that I can get dumps from my existing configuration.


===
Mark Millard
markmi at dsl-only.net




More information about the freebsd-ppc mailing list