sa(4) 9.2->10.1, nsa0.0: request ptr 0x803135040 is not on a page boundary; cannot split request
Konstantin Belousov
kostikbel at gmail.com
Sat Oct 25 17:53:57 UTC 2014
On Fri, Oct 24, 2014 at 05:07:26PM -0600, Kenneth D. Merry wrote:
> On Thu, Oct 23, 2014 at 20:53:06 +0200, Harald Schmalzbauer wrote:
> > Hello,
> >
> > I read about the changes in sa(4) regarding large-block-split changes
> > and the transitional 'kern.cam.sa.allow_io_split' workarround.
> >
> > I'm using bacula (7.0.5) and my previous neccessarry multi-blocking
> > adjustmets like "Minimum block size = 2097152" obviously didn't work
> > with FreebSD 10.1 anymore.
> > Good news is, they are not needed any more!
> > With the default of 126 blocks (64512) I get 60-140MB/s with btape(8)'s
> > speed test on my LTO4 (HH) drive and another quick test showed that
> > using mbuffer(1) for zfs(8) 'send' isn't needed anymore (| dd
> > of=/dev/nsa0 bs=64512 seems to max out LTO4 speed). [with FreeBSD 9 the
> > transfer rates were some magnitudes lower with these block size settings!]
> >
> > Not so good news is, that bacula can't read the tape's label.
> > 'Labeling a tape (with 'label' at bconsole(8) or btape(8)) is
> > successful, and btape(8)'s 'readlabel' partially displays the correct
> > label, but not the very beginning of the label:
> > Volume Label:
> > Id : **error**VerNo
> > ?rest OK
> >
> > While it should read:
> > Volume Label:
> > Id : Bacula 1.0 immortal
> > VerNo : 11
> > ?
> >
> > When btape(8) starts to read the label, the _subject's error is reported_:
> > *nsa0.0: request ptr 0x803135040 is not on a page boundary; cannot split
> > request*
>
> What blocksize are you using with btape(8)?
>
> What kind of controller are you using?
>
> The reason you get that error message is that the sa(4) driver goes through
> physio(9) to get buffers from userland into the kernel. physio(9) relies
> on the vmapbuf()/vunmapbuf() routines to map buffers in and out of the
> kernel.
>
> vmapbuf() operates with a page granularity. The address to be mapped has
> to start on a page boundary. It also uses kernel virtual address segments
> that are MAXPHYS in size. On x86 boxes at least, MAXPHYS is 128KB.
>
> So if you use a blocksize of 128KB, but pass in a pointer that doesn't
> start on a page boundary, vmapbuf() will have to map 33 pages instead of
> 32. In your case, it will have to start at page address 0x803135000, and
> will need 33 4KB pages, which is greater than 128KB.
I want to disable unaligned physio at all.
See https://reviews.freebsd.org/D888 for yet another case where this beats.
Obvious thing which stops us from doing this is binary compatibility.
I need some form of wide support to make this change.
>
> This behavior obviously isn't very user friendly.
>
> If you want to avoid the problem, try setting your blocksize in Bacula to
> 4K less than what is reported in kern.cam.sa.0.maxio. If it's 131072, then
> set the blocksize to 126976.
>
> Another way to avoid the problem is to increase MAXPHYS. Increasing it
> beyond kern.cam.sa.0.cpi_maxio won't help anything. If you increase
> it too much, you can run into other problems.
>
> That said, though, you can probably bump it to 512K without much worry.
> Put this in your kernel config file and recompile/reinstall your kernel:
>
> options MAXPHYS="(512*1024)"
> options DFLTPHYS="(512*1024)"
>
> The same thing applies, though -- you'll want to set your blocksize to 1
> page less than kern.cam.sa.0.maxio, since Bacula isn't using page-aligned
> buffers.
>
> > The same error show up if I configure bacula to use a fixed block size
> > of kern.cam.sa.0.maxio (131072).
>
> At that (i.e. the physio(9)) level, variable vs. fixed block mode won't
> matter.
>
> > Like expected, allowing split (with kern.cam.sa.allow_io_split in
> > loader.conf) works arround that problem.
> > But I'd like to understand why I cannot set kern.cam.sa.0.maxio resp.
> > why btape(8) doesn't work 100% correct although blocksize < sa.0.maxio
>
> See above. The unfortunate thing is that with the above setup, I think
> you'll wind up with a bigger block and then a smaller block going onto the
> tape in variable block mode at least.
>
> This is an example of why I/O splitting is bad -- you don't have good
> visibility from userland into exactly how things are getting put on tape.
> The application writes out what it wants, but it doesn't know what size
> blocks are hitting the tape.
>
> > I don't have enough understanding to check the code myself, if it's a
> > cam/sa(4) issue in FreeBSD or a problem in btape(8) (and also bacula
> > itself, most likely the tool shares the code with bacula's storage deamon).
> >
> > Any hints highly appreciated!
>
> I have considered implementing a custom read/write routine in the sa(4)
> driver to get around some of these issues, but it will require more than
> just sa(4) driver modifications for everything to work optimally.
>
> With a custom read/write routine, if we copied data into the kernel, we
> could essentially allow any I/O size that the controller and tape drive
> support without altering MAXPHYS. And alignment issues wouldn't matter,
> either.
>
> The drawback is that we wouldn't be able to do unmapped I/O for drivers
> that support it. (Unless the user happened to give us a single buffer that
> we could send down as an unmapped I/O.) The unmapped I/O code doesn't
> currently handle scatter/gather lists of unmapped buffers.
>
> Another drawback to copying is the increased overhead of versus unmapped
> I/O. Although on modern hardware, copying is usually more efficient than
> mapping user memory into the kernel's virtual address space, because of the
> TLB shootdowns that happen with the mapping operation.
>
> For tape users with just one tape drive, the overhead wouldn't be a big
> deal. If you have lots of tape drives attached to one machine, though, it
> could have a noticable effect.
>
> Ken
> --
> Kenneth Merry
> ken at FreeBSD.ORG
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
More information about the freebsd-stable
mailing list