Tape block size greater than MAXPHYS
Kenneth D. Merry
ken at FreeBSD.ORG
Tue Dec 30 06:58:25 UTC 2014
On Mon, Dec 29, 2014 at 14:52:12 +0530, Shivaram Upadhyayula wrote:
> Hi,
>
> It seems that currently any tape reads/writes greater than MAXPHYS
> will fail. For example
>
> cpi->maxio = 256 * 1024; /* Controller max io size 256K */
>
> root at quadstorvtl # dd if=/dev/zero of=/dev/sa0 bs=256k count=1
> sa0.0: request size=262144 > si_iosize_max=131072; cannot split request
> sa0.0: request size=262144 > MAXPHYS=131072; cannot split request
> dd: /dev/sa0: File too large
> 1+0 records in
> 0+0 records out
> 0 bytes transferred in 0.000390 secs (0 bytes/sec)
>
> The first limitation comes from sys/cam/scsi/scsi_sa.c:saregister
> /*
> * If maxio isn't set, we fall back to DFLTPHYS. Otherwise we take
> * the smaller of cpi.maxio or MAXPHYS.
> */
> if (cpi.maxio == 0)
> softc->maxio = DFLTPHYS;
> else if (cpi.maxio > MAXPHYS)
> softc->maxio = MAXPHYS;
> else
> softc->maxio = cpi.maxio;
>
> softc limits maxio to MAXPHYS even if the controller supports a higher
> maxio value. I tried removing the limitation which then led me to
> reason for the actual reason for the limiation in
> sys/kern/kern_physio.c:physio
>
> /*
> * If the driver does not want I/O to be split, that means that we
> * need to reject any requests that will not fit into one buffer.
> */
> if (dev->si_flags & SI_NOSPLIT &&
> (uio->uio_resid > dev->si_iosize_max || uio->uio_resid > MAXPHYS ||
> uio->uio_iovcnt > 1)) {
>
> To maintain consistency of the block numbers SI_NOSPLIT has to be set,
> but then to issue the entire request in a single bio the request size
> will be limited to MAXPHYS.
>
> Would is be correct to assume that the only way to increase the tape
> block size for writes/reads is to increase MAXPHYS and recompile the
> kernel ? (As of now on FreeBSD 10.1)
Your analysis is correct.
The reason I added the SI_NOSPLIT code (and set the flag in the sa(4)
driver) is that the previous situation was bad from the standpoint of a
tape drive user. You could write to a tape with a large blocksize, but
that isn't what would actually make it onto the tape.
You wouldn't know exactly what size blocks were making it onto the tape;
that would depend on the size and alignment of the incoming buffers. Now
at least the application has a clear understanding of what is written to
tape.
One problem that was there before the SI_NOSPLIT changes and is still
present is that we can't by default read tapes with a large blocksize (e.g.
1MB). Increasing MAXPHYS will certainly fix it (assuming your controller
sets the maxio field in the path inquiry CCB to something sufficiently
large). I have considered adding a custom read/write routine to the sa(4)
driver that would essentially take the best available path given the
requested block size and the constraints imposed by the controller and
MAXPHYS.
The logic would be something like:
- If the I/O is <= MAXPHYS (including alignment constraints) and the
controller supports it, do unmapped I/O.
- Otherwise, allocate buffers from a sa(4)-specific UMA zone and copy in
and out. This would allow for doing I/O up to the controller's limit,
without regard for MAXPHYS. On modern machines, this would also usually
be faster than mapping the memory in and out of the kernel, because you
avoid the extra TLB shootdowns.
Ideally we'll get a scheme in place to allow doing unmapped S/G lists at
some point. But we don't have that yet.
I have some code with logic similar to the above scenario for the pass(4)
driver asynchronous mode that I has been in my queue to upstream for about
a year.
I also have a very large set of tape driver improvements that I've been
working on (off and on) for about a year and a half. I haven't done the
custom read/write routine yet, but I may do it if I have some time.
By the way, the mps(4) and mpr(4) drivers can do I/O larger than 256KB.
That limit is somewhat arbitrary. Perhaps Steve (CCed) can take a look at
what we need to do to calculate the true limit (which would be based on the
page size of the machine and maximum number of S/G lists the controller can
handle) so we can pass back a more accurate number.
The isp(4) driver I/O limit is accurate. If you try to use it with a
modern tape drive, you'll likely run into some FC-Tape related bugs. I
need to upstream those fixes too.
Ken
--
Kenneth Merry
ken at FreeBSD.ORG
More information about the freebsd-scsi
mailing list