Bacula fails on FreeBSD 10.x / "mt fsf" infinitely proceeds

Kenneth D. Merry ken at FreeBSD.ORG
Wed Jul 30 03:52:33 UTC 2014


On Wed, Jul 30, 2014 at 00:44:14 +0200, Joerg Wunsch wrote:
> "Kenneth D. Merry" <ken at FreeBSD.ORG> wrote:
> 
> > Can you put a printf in the above code snippet, and print out the count,
> > fileno, and last_ctl_resid before fileno is set?  That might tell us
> > something.
> 
> Incidently, I had exactly the same idea just before you wrote it. ;-)
> 
> I had to do a full kernel rebuild though, so it took a while to complete.
> My old FreeBSD 8.2 hardware is also still around, and I could reproduce
> the problem on an elderly DLT2000 drive I've got, which could easily be
> swapped between both machines.  In each case, the drive was attached to
> a sym(4) driver (just in case).

Ahh.  Now I know who to send sa(4) driver patches to test.  :)  I've got a 
large set of patches for the sa(4) driver to improve its support of
modern tape hardware.  The challenge is making sure that I haven't broken
older hardware in the process.

The patches aren't quite ready for testing, but I'm hoping to have time to
get them cleaned up soon.

> On the FreeBSD 10 machine, I get the following:
> 
> sym2: <810a> port 0xc000-0xc0ff mem 0xfe920000-0xfe9200ff irq 21 at device 7.0 on pci4
> sym2: No NVRAM, ID 7, Fast-10, SE, parity checking
> sa0 at sym2 bus 0 scbus5 target 3 lun 0
> sa0: <DEC DLT2000 15/30 GB 840B> Removable Sequential Access SCSI-2 device 
> sa0: Serial Number JF74130050
> sa0: 5.000MB/s transfers (5.000MHz, offset 8)
> (sa0:sym2:0:3:0): 10240-byte tape record bigger than supplied buffer
> saspace(): fileno 0, count 32767, resid 0
> 
> (The "bigger than supplied buffer" message is probably from "mt stat".)

Yes, it does a test read to try to figure out what the blocksize of the
tape is.

> On the FreeBSD 8.2 machine, the printout for the same tape was
> 
> saspace(): fileno 0, count 32767, resid 32762
> 
> (The tape has 5 filemarks on it.)
> 
> So we are probably on the right track.

Yes, it looks like we're not getting the residual set properly.  The most
likely cause is that the sym(4) driver is sending back an incorrect sense
residual, which is causing us to ignore some of the sense data.

> > Another possibility here is that the driver is setting the sense residual
> > incorrectly.
> 
> I've seen that CAM debugging can now be turned on the fly.  Maybe that would
> be of some help here?

Well, it isn't CAM-specific, but we can use DTrace to figure some of it
out.

If you have DTrace enabled (kldload dtraceall), you can run the attached D
script and we'll see what the residual is.  If the attachment gets trimmed
by the list software, the script is also available here:

http://people.freebsd.org/~ken/tapetest.d

On my test system with a SAS-attached LTO-6 (using an mpr(4) controller),
here is what happens if I do a 524290 byte read with dd from a tape written
with 524288 byte blocks:

{black-pearl:/usr/home/kenm:!:0} dd if=/dev/nsa0 of=/dev/null bs=524290 count=1
0+1 records in
0+1 records out
524288 bytes transferred in 0.008861028 secs (59167853 bytes/sec)
{black-pearl:/usr/home/kenm:!:0} dd if=/dev/nsa0 of=/dev/null bs=524290 count=1
0+1 records in
0+1 records out
524288 bytes transferred in 0.003310369 secs (158377510 bytes/sec)

{black-pearl:/usr/home/kenm:!:0} dtrace -s tapetest.d
dtrace: script 'tapetest.d' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0    401                    saerror:entry Opcode 08, Status 0xc Data: Len 524290, Resid 2, Sense: Len 252, Resid 0

  0    401                    saerror:entry Opcode 08, Status 0xc Data: Len 524290, Resid 2, Sense: Len 252, Resid 0

If the sense residual is set too high, that will explain why we're not
getting the info field back and therefore why we're setting the
last_ctl_resid value to 0.

> It's already past midnight, so I'm heading for bed now.  Thanks for the
> responses so far!

You're welcome!  Hopefully we can track this down.

Ken
-- 
Kenneth Merry
ken at FreeBSD.ORG
-------------- next part --------------
/*-
 * Copyright (c) 2014 Spectra Logic Corporation
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions, and the following disclaimer,
 *    without modification.
 * 2. Redistributions in binary form must reproduce at minimum a disclaimer
 *    substantially similar to the "NO WARRANTY" disclaimer below
 *    ("Disclaimer") and any redistribution must be conditioned upon
 *    including a substantially similar Disclaimer requirement for further
 *    binary redistribution.
 *
 * NO WARRANTY
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
 * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
 * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
 * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
 * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 * POSSIBILITY OF SUCH DAMAGES.
 *
 * Authors: Ken Merry           (Spectra Logic Corporation)
 *
 * $FreeBSD$
 */

#pragma D option cleanrate=5000hz
#pragma D option dynvarsize=8192000

/*
 * This is a simple DTrace script to print out basic information about
 * calls to saerror().  This was originally written to help diagnose
 * problems with the sense residual, but could be used for other purposes.
 *
 * Note that CAM_STATUS_MASK == 0x3f and CAM_SCSI_STATUS_ERROR == 0x0c.
 * DTrace doesn't seem to decode the enum, so we just use hard-coded values
 * here.
 */
fbt:kernel:saerror:entry
/(args[0]->ccb_h.status & 0x3f) == 0x0c/
{
	printf("Opcode %02x, Status %#x Data: Len %u, Resid %u, Sense: Len %u, Resid %u\n",
	       args[0]->csio.cdb_io.cdb_bytes[0],
	       args[0]->ccb_h.status & 0x3f, args[0]->csio.dxfer_len,
	       args[0]->csio.resid, args[0]->csio.sense_len,
	       args[0]->csio.sense_resid);
}


More information about the freebsd-scsi mailing list