Bacula fails on FreeBSD 10.x / "mt fsf" infinitely proceeds

Kenneth D. Merry ken at FreeBSD.ORG
Tue Jul 29 20:43:57 UTC 2014


On Tue, Jul 29, 2014 at 21:18:29 +0200, Joerg Wunsch wrote:
> As Martin Simmons wrote:
> 
> > Maybe you are now connecting the tape drive via a different SCSI
> > driver?
> 
> No, I forgot to say: the tape drive/library is a Sun L9 which has
> HV-Diff-SCSI, so I have to use the exact same Symbios Logic SCSI
> controller (and driver) as before.
> 
> > It sounds like you are running Bacula with "Fast Forward Space File
> > = yes" in the configuration.
> 
> Yes, that's the case.  However, even without that, I'm afraid the
> Bacula logic would run into an infinite loop, as a single FSF
> operation now always succeeds, and pretends it encountered a new tape
> file.  (Besides, the "Fast Forward Space File" thing did work for many
> years.)
> 
> Looking into saspace() in sys/cam/scsi/scsi_sa.c, I see:
> 
> ====================================================================
>         } else if (code == SS_FILEMARKS && softc->fileno != (daddr_t) -1) {
>                 softc->fileno += (count - softc->last_ctl_resid);
>                 if (softc->fileno < 0)  /* we must of hit BOT */
>                         softc->fileno = 0;
>                 softc->blkno = 0;
> ====================================================================
> 
> That piece of code ought to be responsible when the SPACE command hit
> a filemark.  It hasn't been changed for more than a decade though.
> 
> Now the following SVN log message rang a bell to me:
> 
> ====================================================================
> r225950 | ken | 2011-10-03 22:32:55 +0200 (Mo, 03. Okt 2011) | 146 Zeilen
> 
> Add descriptor sense support to CAM, and honor sense residuals properly in
> CAM.
> ====================================================================
> 
> It went in after my older (working) 8.2 system, it talks about
> residual handling, and the code above uses "softc->last_ctl_resid".
> 
> It wouldn't surprise me if that's somehow related to the issue.

Yes, it could be related.  The descriptor sense changes abstracted out
sense data handling so that fixed and descriptor sense would be handled in
the same way.

The residual got bumped up from 32 to 64 bits to accommodate the increased
size of the descriptor sense fields.

In theory the values should be equivalent, but it is possible that there is
breakage.

Can you put a printf in the above code snippet, and print out the count,
fileno, and last_ctl_resid before fileno is set?  That might tell us
something.

The original code in saerror did this with the residual:

		info = (int32_t) scsi_4btoul(sense->info);
		resid = info;

resid was then assigned to last_ctl_resid.  Everything was a 32 bit value;
info was int32_t and resid was uint32_t.

The new code (in scsi_get_sense_info() in scsi_all.c) effectively does:

	uint32_t info_val;

	info_val = scsi_4btoul(sense->info);

	*info = info_val;
       	if (signed_info != NULL)
		*signed_info = (int32_t)info_val;

info and signed_info are uint64_t and int64_t, respectively.

The info value is what makes it into last_ctl_resid.

Another possibility here is that the driver is setting the sense residual
incorrectly.  If that happens, then we would think that the info field
isn't present in the sense, and would report the entire transfer length as
the residual.  (For a space command, I don't think there would be a
transfer.)

The sym(4) driver does set the sense residual, but I'd have to dig into
it a little more to figure out whether it is doing the right thing.

Hopefully a few printfs will give us a better idea of what is going on.  If
the printf in saspace() doesn't show anything suspicious, the next place to
look would be at the sense_len in saerror().

> I'm Cc'ing Ken (as the committer of 225950) for an opinion, just in
> case he doesn't follow the list so closely.

Ken
-- 
Kenneth Merry
ken at FreeBSD.ORG


More information about the freebsd-scsi mailing list