cb_dumpdata vs. PowerMac G5's (or at least my SSD context): add involvement of dumpinfo's maxiosize value?
Mark Millard
markmi at dsl-only.net
Mon Mar 2 03:07:56 UTC 2015
Justing H. wrote:
> This was working in the original powerpc64 dump code. Look at r257941,
> which was the original code that I committed. I guess the DMA size
> clamping didn't get migrated to the new dump code.
So looking around... Using Sticky Revision 257941 shows /head at 257941.
sys/powerpc/aim/mmu_oea64.c is what was changed in 257941: adding moea64_dumpsys_map and moea64_scan_md and the MMUMETHOD entries mmu_scan_md and mmu_dumpsys_map .
For /stable/10/sys/powerpc/aim/mmu_oea64.c the MFC that includes 257941 is revision 260672 and the most recent code is 279920. moea64_dumpsys_map, moea64_scan_md, mmu_scan_md, and mmu_dumpsys_map are still there.
First a reminder of the usage context for the two routines (pmap_scan_md and pmap_dumpsys_map route to the matching moea64_.* routines).
static int
foreach_chunk(callback_t cb, void *arg)
{
struct pmap_md *md;
int error, seqnr;
seqnr = 0;
md = pmap_scan_md(NULL);
while (md != NULL) {
error = (*cb)(md, seqnr++, arg);
if (error)
return (-error);
md = pmap_scan_md(md);
}
return (seqnr);
}
... (The below is from my DFLTPHYS/2 variant of the original.) ...
static int
cb_dumpdata(struct pmap_md *md, int seqnr, void *arg)
{
struct dumperinfo *di = (struct dumperinfo*)arg;
u_int max_trans_sz = (0==di) ? 0 : (di->maxiosize<DFLTPHYS/2) ? di->maxiosize : DFLTPHYS/2;
/* If md->md_size turns out to be 0 the above avoids derefercing di.
* The original code that ignored di->maxiosize had that property
* overall. So this update does too.
*/
...
while (resid) {
sz = (resid > max_trans_sz) ? max_trans_sz : resid;
va = pmap_dumpsys_map(md, ofs, &sz);
...
error = di->dumper(di->priv, (void*)va, 0, dumplo, sz);
pmap_dumpsys_unmap(md, ofs, va);
...
}
printf("... %s\n", (error) ? "fail" : "ok");
return (error);
}
Then the routine that the signature/interface gives the potential to adjust the size:
vm_offset_t
moea64_dumpsys_map(mmu_t mmu, struct pmap_md *md, vm_size_t ofs,
vm_size_t *sz)
{
if (md->md_vaddr == ~0UL)
return (md->md_paddr + ofs);
else
return (md->md_vaddr + ofs);
}
This code ignores its sz argument. NULL could have been passed in for sz and nothing would change.
As stands pmap_dumpsys_map/moea64_dumpsys_map only determine the starting address, leaving sz alone.
It is not clear to me that moea64_dumpsys_map would have access to the dma.max_iosize information for the ata_channel that happens to be in use.
Summary of my looking at pmap_scan_md and pmap_dumpsys_map usage: no effect on the DMA size used and it may not be the appropriate context for an ata_channel's dma.max_iosize based sz adjustment.
My change to DFLTPHYS/2 as I listed earlier as a possibility did allow chunks to be written --until chunk 29 got an error in my only test so far. With DFLTPHYS it was no-go.
Anyone with a context that gets a 64*1024 dma.max_iosize would not see the "no-go" problem and for all I know there are such PowerMac G5 contexts. So to know if the problem is being tested requires establishing ata_channel's dma.max_iosize is both involved and is smaller than 64*1024. I've definitely got that context.
===
Mark Millard
markmi at dsl-only.net
On 2015-Mar-1, at 04:47 PM, Mark Millard <markmi at dsl-only.net> wrote:
Unfortunately reporting the trace of finding what is happening for the DMA size panic-dump issue for powerpc64 on PowerMac G5's (and others FreeBSDs?) is rather long...
My test of trying a variant of my example code still ended up trying for a 64K DMA when only 32K is allowed according to the error message it generated.
It would appear the dumperinfo's maxiosize is too big as well. Looking around...
$ find sys -name '*.[hcsm]' -exec grep "\<oversized DMA\>" {} \; -print | more
"FAILURE - oversized DMA transfer attempt %d > %d\n",
sys/dev/ata/ata-dma.c
"FAILURE - oversized DMA transfer attempt %d > %d\n",
sys/powerpc/powermac/ata_dbdma.c
static int
ata_dmaload(struct ata_request *request, void *addr, int *entries)
{
struct ata_channel *ch = device_get_softc(request->parent);
...
if (request->bytecount > ch->dma.max_iosize) {
device_printf(request->parent,
"FAILURE - oversized DMA transfer attempt %d > %d\n",
request->bytecount, ch->dma.max_iosize);
return EIO;
}
...
}
and
static int
ata_dbdma_load(struct ata_request *request, void *addr, int *entries)
{
struct ata_channel *ch = device_get_softc(request->parent);
...
if (request->bytecount > ch->dma.max_iosize) {
device_printf(request->dev,
"FAILURE - oversized DMA transfer attempt %d > %d\n",
request->bytecount, ch->dma.max_iosize);
return EIO;
}
...
}
These suggest that the dumperinfo's maxiosize might sometimes be legitimately bigger than ata_channel's dma.max_iosize since dma.max_iosize would be very specific to DMA transfers.
So it would appear that either:
A) ata_channel's dma.max_iosize should also be considered in dump_machdep.c .
or
B) dumperinfo's maxiosize value should in part be based on ata_channel's dma.max_iosize when an ata_channel is involved and DMA would be involved.
Then looking...
$ find sys -name '*.[hcSm]' -exec grep "\<maxiosize\>" {} \; -print | more
maxdumpsz = di->maxiosize;
sys/mips/mips/minidump_machdep.c
maxdumppgs = min(di->maxiosize / PAGE_SIZE, MAXDUMPPGS);
sys/x86/x86/dump_machdep.c
gkd->di.maxiosize = DFLTPHYS;
sys/geom/raid/g_raid.c
gkd->di.maxiosize = dp->d_maxsize;
sys/geom/geom_disk.c
maxdumpsz = min(di->maxiosize, MAXDUMPPGS * PAGE_SIZE);
sys/i386/i386/minidump_machdep.c
maxdumpsz = min(di->maxiosize, MAXDUMPPGS * PAGE_SIZE);
sys/amd64/amd64/minidump_machdep.c
u_int maxiosize; /* Max size allowed for an individual I/O */
sys/sys/conf.h
maxdumpsz = di->maxiosize;
sys/arm/arm/minidump_machdep.c
u_int max_trans_sz = (0==di) ? 0 : (di->maxiosize<DFLTPHYS) ? di->maxiosize : DFLTPHYS;
* The original code that ignored di->maxiosize had that property
sys/powerpc/powerpc/dump_machdep.c
static void
g_disk_kerneldump(struct bio *bp, struct disk *dp)
{
struct g_kerneldump *gkd;
struct g_geom *gp;
gkd = (struct g_kerneldump*)bp->bio_data;
...
gkd->di.maxiosize = dp->d_maxsize;
...
}
So that would trace back to disk's d_maxsize. The only assignment to the field that I find is in g_disk_access:
g_disk_access(struct g_provider *pp, int r, int w, int e)
{
struct disk *dp;
struct g_disk_softc *sc;
int error;
g_trace(G_T_ACCESS, "g_disk_access(%s, %d, %d, %d)",
pp->name, r, w, e);
g_topology_assert();
sc = pp->private;
if (sc == NULL || (dp = sc->dp) == NULL || dp->d_destroyed) {
...
return (ENXIO);
}
...
if ((pp->acr + pp->acw + pp->ace) == 0 && (r + w + e) > 0) {
...
if (dp->d_maxsize == 0) {
printf("WARNING: Disk drive %s%d has no d_maxsize\n",
dp->d_name, dp->d_unit);
dp->d_maxsize = DFLTPHYS;
}
...
}
So it appears that the caller(s) of disk_create need to supply the value to the field. Looking at sys/cam/ata/ata_da.c's calling code...
#ifndef ata_disk_firmware_geom_adjust
#define ata_disk_firmware_geom_adjust(disk)
#endif
...
static cam_status
adaregister(struct cam_periph *periph, void *arg)
{
struct ada_softc *softc;
struct ccb_pathinq cpi;
struct ccb_getdev *cgd;
char announce_buf[80], buf1[32];
struct disk_params *dp;
caddr_t match;
u_int maxio;
int legacy_id, quirks;
...
bzero(&cpi, sizeof(cpi));
xpt_setup_ccb(&cpi.ccb_h, periph->path, CAM_PRIORITY_NONE);
cpi.ccb_h.func_code = XPT_PATH_INQ;
xpt_action((union ccb *)&cpi);
...
maxio = cpi.maxio; /* Honor max I/O size of SIM */
if (maxio == 0)
maxio = DFLTPHYS; /* traditional default */
else if (maxio > MAXPHYS)
maxio = MAXPHYS; /* for safety */
if (softc->flags & ADA_FLAG_CAN_48BIT)
maxio = min(maxio, 65536 * softc->params.secsize);
else /* 28bit ATA command limit */
maxio = min(maxio, 256 * softc->params.secsize);
softc->disk->d_maxsize = maxio;
...
ata_disk_firmware_geom_adjust(softc->disk);
...
}
So far I do not see any hint of a ata_channel's dma.max_iosize being involved in setting d_maxsize. (Only pc98 and sparc64 seem to have ata_disk_firmware_geom_adjust definitions.)
As overall there may not be a global requirement to use DMA this may well make sense.
But it would mean that when DMA is to be used and it is an ata_channel context that the ata_channel's dma.max_iosize should be consulted. (Not that ata_channel need be the only context with such an issue. But it is the one involved in my problem.)
Looking around I find the following assignments:
$ find sys -name '*.[hcSm]' -exec grep "\<max_iosize\>" {} \; -print | more
ch->dma.max_iosize = (ATA_SIIPRB_DMA_ENTRIES - 1) * PAGE_SIZE;
sys/dev/ata/chipsets/ata-siliconimage.c
ch->dma.max_iosize = 65536;
sys/dev/ata/chipsets/ata-promise.c
ch->dma.max_iosize = 64 * DEV_BSIZE;
sys/dev/ata/chipsets/ata-serverworks.c
ch->dma.max_iosize = (ATA_AHCI_DMA_ENTRIES - 1) * PAGE_SIZE;
sys/dev/ata/chipsets/ata-ahci.c
ch->dma.max_iosize = 64 * DEV_BSIZE;
sys/dev/ata/chipsets/ata-cyrix.c
ch->dma.max_iosize = 64 * DEV_BSIZE;
sys/dev/ata/chipsets/ata-national.c
if (ch->dma.max_iosize > 256 * 512)
ch->dma.max_iosize = 256 * 512;
sys/dev/ata/chipsets/ata-acerlabs.c
ch->dma.max_iosize = 64 * DEV_BSIZE;
sys/dev/ata/chipsets/ata-marvell.c
... (no assignment) ...
sys/dev/ata/ata-all.c
... (no assignment) ...
sys/dev/ata/ata-all.h
if (ch->dma.max_iosize == 0)
ch->dma.max_iosize = MIN((ATA_DMA_ENTRIES - 1) * PAGE_SIZE, MAXPHYS);
NULL, NULL, ch->dma.max_iosize,
NULL, NULL, ch->dma.max_iosize,
if (request->bytecount > ch->dma.max_iosize) {
request->bytecount, ch->dma.max_iosize);
sys/dev/ata/ata-dma.c
... (no assignment) ...
sys/arm/mv/mv_sata.c
... (no assignment) ...
sys/powerpc/powermac/ata_dbdma.c
... (no assignment) ...
sys/cam/ctl/ctl_backend_block.c
All but possibly the last being ata contexts for the field name max_iosize .
In sys/sys/param.h I find:
#define DEV_BSHIFT 9 /* log2(DEV_BSIZE) */
#define DEV_BSIZE (1<<DEV_BSHIFT)
So DEV_BSIZE == 512. That means several of the above assignment values are less than 64*1024 (a.k.a. DFLTPHYS), some being 64*512 (so 32*1024) like the PowerMac G5 is ending up with for my configuration. There is also:
$ find sys -name '*.[h]' -exec grep "\<ATA_.*_ENTRIES\>" {} \; -print | more
#define ATA_AHCI_DMA_ENTRIES 129
struct ata_ahci_dma_prd prd_tab[ATA_AHCI_DMA_ENTRIES];
#define ATA_DMA_ENTRIES 256
sys/dev/ata/ata-all.h
and sys/powerpc/include/param.h has:
#define PAGE_SHIFT 12
#define PAGE_SIZE (1L << PAGE_SHIFT) /* Page size */
#define PAGE_MASK (vm_offset_t)(PAGE_SIZE - 1)
So PAGE_SIZE == 4096 for the context. So any ATA_..._ENTRIES * PAGE_SIZE alternatives are not involved in my problem.
Looking at what dumper is assigned...
static void
g_disk_kerneldump(struct bio *bp, struct disk *dp)
{
struct g_kerneldump *gkd;
struct g_geom *gp;
...
if (dp->d_dump == NULL) {
g_io_deliver(bp, ENODEV);
return;
}
gkd->di.dumper = dp->d_dump;
...
}
where sys/cam/ata/ata_da.c has:
static cam_status
adaregister(struct cam_periph *periph, void *arg)
{
struct ada_softc *softc;
...
softc->disk->d_dump = adadump;
...
}
adadump has no logic to limit its I/O requests by the dma.max_iosize when given a length bigger than that. sys/cam/ata/ata_da.c has no text "dma" at all. But adadump does have:
adadump(void *arg, void *virtual, vm_offset_t physical, off_t offset, size_t length)
{
...
if ((softc->flags & ADA_FLAG_CAN_48BIT) &&
(lba + count >= ATA_MAX_28BIT_LBA ||
count >= 256)) {
ata_48bit_cmd(&ccb.ataio, ATA_WRITE_DMA48,
0, lba, count);
} else {
ata_28bit_cmd(&ccb.ataio, ATA_WRITE_DMA,
0, lba, count);
}
...
}
which makes the I/O a DMA based I/O for sure.
I do not see that sys/cam/ata/... explicitly uses or exposes ata_channel's dma.max_iosize in any way.
As a matter of interfacing I would expect that --just like the dumper field gets into the appropriate cam code-- any abstraction spanning dma.max_iosize sorts of issues would also reference something that gets into the cam code that matches up with the dumper cam code.
There seems to be no such interface. One could imagine a new dumpinfo field (look for the dumper_adjusted_maxsize below):
static int
cb_dumpdata(struct pmap_md *md, int seqnr, void *arg)
{
struct dumperinfo *di = (struct dumperinfo*)arg;
u_int max_trans_sz = (di->maxiosize<DFLTPHYS) ? di->maxiosize : DFLTPHYS;
...
max_trans_sz = di->dumper_adjusted_maxsize(di->priv, max_trans_sz);
...
printf(" chunk %d: %lu bytes ", seqnr, (u_long)resid);
while (resid) {
sz = (resid > max_trans_sz) ? max_trans_sz : resid;
...
error = di->dumper(di->priv, (void*)va, 0, dumplo, sz);
...
}
(Middle layers would need to be set up to allow plugging in the value into dumpinfo's new field and sys/cam/ata/... would need ata_channel's dma.max_iosize access and use of it.)
In many cases dumper_adjusted_maxsize might be an identity operation (no adjustment) or a simple maximum operation but for sys/cam/ata/ata_da.c contexts its main purpose would be to apply DMA size limitations if they are smaller than the parameter passed in.
Out of all this my overall conclusion is that there is a hole in the FreeBSD design and without some design changes the technique is to change the constant values used in cb_dumpdata to values that happen to work.
In this simpler line of things it leaves me wondering if something like DFLTPHYS/2 would be appropriate for dumper contexts in official 10.1-STABLE FreeBSD for now:
static int
cb_dumpdata(struct pmap_md *md, int seqnr, void *arg)
{
struct dumperinfo *di = (struct dumperinfo*)arg;
u_int max_trans_sz = (di->maxiosize<DFLTPHYS/2) ? di->maxiosize : DFLTPHYS/2;
...
printf(" chunk %d: %lu bytes ", seqnr, (u_long)resid);
while (resid) {
sz = (resid > max_trans_sz) ? max_trans_sz : resid;
...
error = di->dumper(di->priv, (void*)va, 0, dumplo, sz);
...
}
More of the existing ch->dma.max_iosize assignments would not be exceeded and so more contexts could then generate panic dumps.
Context for the experiment that prompted the above looking around:
$ freebsd-version -ku; uname -a
10.1-RELEASE-p6
10.1-STABLE
FreeBSD FBSDG5M1 10.1-RELEASE-p6 FreeBSD 10.1-RELEASE-p6 #21 r279264M: Sun Mar 1 03:31:59 PST 2015 root at FBSDG5M1:/usr/obj/usr/home/markmi/src_10_1_releng/sys/GENERIC64vtsc powerpc
$ svnlite st
? .snap
M sys/ddb/db_main.c
M sys/ddb/db_script.c
M sys/powerpc/conf
? sys/powerpc/conf/GENERIC64vtsc
M sys/powerpc/ofw/ofw_machdep.c
M sys/powerpc/ofw/ofwcall64.S
M sys/powerpc/powerpc/dump_machdep.c
$ svnlite info
Path: .
Working Copy Root Path: /usr/src
URL: https://svn0.us-west.freebsd.org/base/stable/10
Relative URL: ^/stable/10
Repository Root: https://svn0.us-west.freebsd.org/base
Repository UUID: ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
Revision: 279201
Node Kind: directory
Schedule: normal
Last Changed Author: pluknet
Last Changed Rev: 279201
Last Changed Date: 2015-02-23 00:45:42 -0800 (Mon, 23 Feb 2015)
$ svnlite diff sys/powerpc/powerpc/dump_machdep.c | more
Index: sys/powerpc/powerpc/dump_machdep.c
===================================================================
--- sys/powerpc/powerpc/dump_machdep.c (revision 279201)
+++ sys/powerpc/powerpc/dump_machdep.c (working copy)
@@ -113,6 +113,11 @@
cb_dumpdata(struct pmap_md *md, int seqnr, void *arg)
{
struct dumperinfo *di = (struct dumperinfo*)arg;
+ u_int max_trans_sz = (0==di) ? 0 : (di->maxiosize<DFLTPHYS) ? di->maxiosize : DFLTPHYS;
+ /* If md->md_size turns out to be 0 the above avoids derefercing di.
+ * The original code that ignored di->maxiosize had that property
+ * overall. So this update does too.
+ */
vm_offset_t va;
size_t counter, ofs, resid, sz;
int c, error, twiddle;
@@ -124,10 +129,10 @@
ofs = 0; /* Logical offset within the chunk */
resid = md->md_size;
- printf(" chunk %d: %lu bytes ", seqnr, (u_long)resid);
+ printf(" chunk %d: %lu bytes max_trans_sz %lu ", seqnr, (u_long)resid, (u_long)max_trans_sz);
while (resid) {
- sz = (resid > DFLTPHYS) ? DFLTPHYS : resid;
+ sz = (resid > max_trans_sz) ? max_trans_sz : resid;
va = pmap_dumpsys_map(md, ofs, &sz);
counter += sz;
if (counter >> 24) {
===
Mark Millard
markmi at dsl-only.net
On 2015-Mar-1, at 04:14 AM, Mark Millard <markmi at dsl-only.net> wrote:
Context: 10.1-STABLE powerpc64 for PowerMac G5
I'm unable to get panic dumps because the DMA size is rejected, 64K being too big. I'd like to get things to the point where panic dumps are possible for that context.
Looking around I find:
root at FBSDG5M1:/usr/src # grep DFLTPHYS sys/sys/*.h
sys/sys/param.h:#ifndef DFLTPHYS
sys/sys/param.h:#define DFLTPHYS (64 * 1024) /* default max raw I/O transfer size */
sys/sys/param.h:#define MAXDUMPPGS (DFLTPHYS/PAGE_SIZE)
root at FBSDG5M1:/usr/src # grep DFLTPHYS sys/powerpc/powerpc/*
sys/powerpc/powerpc/dump_machdep.c: sz = (resid > DFLTPHYS) ? DFLTPHYS : resid;
static int
cb_dumpdata(struct pmap_md *md, int seqnr, void *arg)
{
struct dumperinfo *di = (struct dumperinfo*)arg;
...
resid = md->md_size;
...
while (resid) {
sz = (resid > DFLTPHYS) ? DFLTPHYS : resid;
...
error = di->dumper(di->priv, (void*)va, 0, dumplo, sz);
...
}
Which may well be where the 64K is from. sys/sys/conf.h has:
struct dumperinfo {
dumper_t *dumper; /* Dumping function. */
void *priv; /* Private parts. */
u_int blocksize; /* Size of block in bytes. */
u_int maxiosize; /* Max size allowed for an individual I/O */
off_t mediaoffset; /* Initial offset in bytes. */
off_t mediasize; /* Space available in bytes. */
};
So it would appear that maxiosize from dumperinfo was supposed to be involved. But at this level nothing is checking dumperinfo's maxiosize field value to avoid using anything bigger.
It may be that the di->dumper is supposed to deal with allowed sizes being smaller than its size parameter indicates (sz here). But then exposing maxiosize in dumperinfo would seem a bit odd as an interface.
My initial guess would be that cb_dumpdata should be more like:
static int
cb_dumpdata(struct pmap_md *md, int seqnr, void *arg)
{
struct dumperinfo *di = (struct dumperinfo*)arg;
u_int max_trans_sz = (di->maxiosize<DFLTPHYS) ? di->maxiosize : DFLTPHYS;
...
printf(" chunk %d: %lu bytes ", seqnr, (u_long)resid);
while (resid) {
sz = (resid > max_trans_sz) ? max_trans_sz : resid;
...
error = di->dumper(di->priv, (void*)va, 0, dumplo, sz);
...
}
I have assumed that even when 0==resid that 0!=arg: the origin code would not dereference di in that kind of context so a little more conditional logic might be required if that property needs to be preserved.
Anyone know if I seem to be going in a reasonable direction for this?
I've only noted the large transfer size that I found. Many other places use a di->dumper with the size of something much smaller or with a smaller symbolic constant (such as having value 512). Again there is no maxiosize handling. It seems there there is an expected (implicit?) minimum size for maxiosize such that these various things would always fit. From that point of view I'm just objecting to DFLTPHYS being that minimum I guess: I want a smaller minimum so that I can get dumps from my existing configuration.
===
Mark Millard
markmi at dsl-only.net
More information about the freebsd-ppc
mailing list