port of linux patch for hyperv storvsc -- works but how?

Wed Nov 13 16:04:21 UTC 2019

> On Nov 13, 2019, at 8:12 AM, Andriy Gapon <avg at FreeBSD.org> wrote:
> 
> 
> I would like to ask you to review a patch in this review request:
> https://reviews.freebsd.org/D22312
> The patch comes from Linux:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=40630f462824ee24bc00d692865c86c3828094e0
> 
> The change is quite simple, but I have a hard time understanding how it works.
> 
> 	ccb->csio.scsi_status = (vm_srb->scsi_status & 0xFF);
> -	ccb->csio.resid = ccb->csio.dxfer_len - vm_srb->transfer_len;
> +	if (srb_status == SRB_STATUS_SUCCESS ||
> +	    srb_status == SRB_STATUS_DATA_OVERRUN)
> +		ccb->csio.resid = ccb->csio.dxfer_len - vm_srb->transfer_len;
> +	else
> +		ccb->csio.resid = ccb->csio.dxfer_len;
> 
> My motivation for porting that change is a report from a user that they are
> getting a lot of zpool checksum errors while concurrently running zpool scrub
> and iozone -l 1 -s 100G on a pool created on top of storvsc devices.
> The user reports that the change fixes the problem.
> I have not been able to analyze the earlier errors in details, so I am not sure
> if they were getting any SRB_STATUS_DATA_OVERRUN errors.  Also, I don't know
> what kind of sense data was provided for errors.
> I do know that they were getting a  lot of these messages duirng the test when
> booverbose was enabled:
> storvsc scsi_status = 2, srb_status = 4
> scsi_status = 2 is SCSI_STATUS_CHECK_COND
> srb_status = 4 is SRB_STATUS_ERROR
> So, nothing unusual.
> 
> I would like to understand how the change can possibly make a difference.
> As far as I can tell, csio.resid is completely ignored if a ccb fails.

The csio.resid is passed to bp->bio_resid in dadone() in the event of an error.  This
in turn is passed up to callers, and probably informs ZFS on how much data did not
actually make it to the device in the event of an error (I’m not going to dig into that
right now, though).

> Also, as far as I can see, it's ignored for ccb retries as well.
> That is, we never modify dxfer_len.

Disk I/O needs to be atomic and idempotent; if we tried to restart partially failed
transactions, we might race another process that is doing I/O to the same area, and
get merged results.  Thus, we always retry full transactions.

> And if the ccb succeeds then resid would be set the same with the old and new code.

Presumably this will be “0”, which is good.

> The only possibility I could come up with is the ccb's sense data having
> SSD_KEY_RECOVERED_ERROR.  In that case, the ccb would be considered successful.
> But I am not sure even about that case.
> 

Yeah, there’s a lot more logic that is needed, and i think the problem is that the
hyperV model doesn’t provide a useful resid value, forcing us to guess at what it
should be.

I think that SRB_STATUS_DATA_UNDERRUN might also require special handling.

Scott