CAM Shingled Disk support patches available

Scott Long scott4long at yahoo.com
Wed Mar 2 03:13:50 UTC 2016


Hi Ken,

I’m against changing the function signature of scsi_ata_pass_16().  Even
if you manage to get things right with symbol versioning, it still leads to
problems of code compatibility.  Maybe pre-existing binaries will work, but
source code will forever have to include an #if __FreeBSD_version <
xxxxxx bit of nonsense.

I agree that it was incorrect for dxferlen to be declared as a uint16_t.
However, the function already contains a sector count argument pair.  In
theory the sector count multiplied by the sector length, both of which the
application should know in order to arrive at a sensible dxferlen, can
substitute for the dxferlen argument.  If so, then we can just ignore that
argument and declare that sector_count has logical priority.

Really though, I think that scsi_ata_pass_16() is a crummy function.  If its
purpose is to implement SAT-3 12.2.2, it does an incredibly poor job at it:

- By my count, it only covers 12 of the available 13 registers.

- It has no 12 byte, opcode 0xa1 variant.

- It doesn’t make any allowance for providing the response registers to the
caller on completion.  Well, maybe it kinda does through a sense descriptor,
but…. it’s kinda open to vague interpretation.

- Its use of the registers is clunky, assuming for example that you’ll only want
to fill the six LBA registers with a host-ordered 64-bit number.  There are
plenty of commands that re-use sub-parts of the LBA, features, and/or sector
count registers for different things.  

I know you stated that you didn’t want to do this, but I think it’s better to start
over with a better function that has a better signature and a new name.  In fact,
I think it’s better to use the existing ata_cmd and ata_res structures from 
sys/cam/ata/ata_all.h, provide accessors for the multi-byte registers if needed,
provide a 12-byte compatibility, and simply the signature.  Something like this:

void scsi_ata_pass(struct ccb_scsiio *csio, u_int32_t retries,
                      void (*cbfcnp)(struct cam_periph *, union ccb *),
                      u_int32_t flags, u_int8_t tag_action,
                      struct ata_cmd *cmd, struct ata_res *res,
                      u_int8_t *data_ptr, u_int32_t dxfer_len,
                      u_int8_t *data_ptr, u_int16_t dxfer_len,
                      u_int8_t sense_len, u_int32_t timeout);

To differentiate between the 12 and 16 byte variants, you’d look at the
AP_EXTEND flag in the protocol field.  Btw, the handling of that flag is
inconsistent in the implementation of the existing scsi_ata_pass_16().  If
the caller providse an ata_res pointer then it gets filled on completion,
otherwise the caller does its best to look at 12.2.2.6 and extract what it
can from the sense descriptor.

So my proposal is to create a new scsi_ata_pass and deprecate but not remove
scsi_ata_pass_16.  Tell people that if they need to use it, dxfer_len is going to
have lower priority than sector_count/sector_count_exp if the latter multiply to
more than 65535.

Scott



> On Mar 1, 2016, at 3:47 PM, Kenneth D. Merry <ken at FreeBSD.ORG> wrote:
> 
> I have a new set of SMR patches available.  (See the original message below
> for a more detailed description of what these patches do.)
> 
> The primary change is to add library versioning to libcam so that we can
> change the function prototype of scsi_ata_pass_16() in a way that won't
> break existing binaries.
> 
> If someone more familiar with library versioning wants to review this, I'd
> appreciate it.
> 
> The patches are here:
> 
> FreeBSD/head, as of SVN revision 296278
> 
> https://people.freebsd.org/~ken/cam_smr.head.20160301.1.txt
> 
> stable/10, as of SVN revision 296248
> 
> https://people.freebsd.org/~ken/cam_smr.stable10.20160301.1.txt
> 
> (Note that although there is a stable/10 version of the patches, I'm not
> planning to merge them to stable/10 because of the change to struct bio.  I
> can't really figure out a good way to make that backward compatible.  If
> there is consensus that breaking it is fine because it isn't a user API,
> then that may be another story.)
> 
> The problem is that the existing, in-tree version of scsi_ata_pass_16() has
> a dxfer_len argument that is a uint16_t.  That restricts transfer sizes to
> 64KB.  So, we need to update it to allow larger than 64K transfers.  I
> could just create a new function, but I'd rather just retire the broken
> version.
> 
> The intent here is that:
> 
> 1. Binaries built against the old version of libcam, before versioning was
> turned on, will get the old version of the scsi_ata_pass_16() function with
> a uint16_t dxfer_len.
> 
> 2. Binaries built against the new version of libcam will get the new
> version of the scsi_ata_pass_16() function with a uint32_t dxfer_len.
> 
> I've tested this, and it appears to work, but I'm not 100% certain this is
> all correct.  I looked at Dan Eischen's description of symbol versioning
> here:
> 
> https://people.freebsd.org/~deischen/symver/freebsd_versioning.txt
> 
> And it looks like the actual implementation is a little different than what
> is described there.  I looked around the tree, and didn't see anything that
> is obviously exactly like what I'm trying to do here.
> 
> So, what I did is as follows:
> 
> 1. For the kernel, the only change is to switch the dxfer_len argument from
> a uint16_t to a uint32_t.
> 
> 2. For userland, in scsi_all.c, there are now two versions of
> scsi_ata_pass_16 -- _ver1 and _ver2.  _ver1 is aliased to
> scsi_ata_pass_16() for FBSD_1.3 using __sym_compat().  _ver2 is aliased to
> scsi_ata_pass_16() for FBSD_1.4 using __sym_default().
> 
> 3. In lib/libcam/Versions.def, I defined FBSD_1.3 and FBSD_1.4, which
> depends on FBSD_1.3.
> 
> 4. In lib/libcam/Symbol.map, I pulled out all of the functions defined in
> libcam, sorted them, and defined them in FBSD_1.3.  I moved
> scsi_ata_pass_16() to FBSD_1.4.  (According to the freebsd_versioning.txt
> paper linked above, I should have been able to have scsi_ata_pass_16() in
> both FBSD_1.3 and FBSD_1.4, but that isn't the case in practice.)
> 
> In testing an old binary (linked against libcam without symbol versioning)
> against a new libcam (with symbol versioning), the old version of the
> function appears to be used.  With a new binary, the new version of the
> function appears to be used.
> 
> So it looks like things work as intended, but I don't fully trust my
> understanding here.  So, if someone could take a look at the changes, I'd
> appreciate it.
> 
> In particular, I have a few questions:
> 
> 1. If this change to scsi_ata_pass_16() gets merged to stable/10 (apart
> from the larger SMR changes), what should be done with the libcam library
> version?
> 
> 2. Are 1.3 and 1.4 the proper versions to use?
> 
> 3. If we make additional CAM helper function library changes, when do the
> versions get bumped?  i.e., is this an opportunity to look for other
> library functions with issues and make changes if possible?
> 
> 4. When you're going from an unversioned library to a versioned library,
> which version of a function gets linked in to a binary linked to the
> unversioned library when you run it against a versioned library?  In other
> words, what is supposed to happen in the test scenario I tried above, and
> am I really seeing what is supposed to happen?
> 
> Thanks,
> 
> Ken
> 
> On Mon, Jan 18, 2016 at 17:37:04 -0500, Kenneth D. Merry wrote:
>> I have a new set of SMR patches available.  See below for the full
>> explanation.
>> 
>> The primary change here is that I have added SMR support to the ada(4)
>> driver.  I spent some time considering whether to try to make the da(4) and
>> ada(4) probe infrastructure somewhat common, but in the end concluded it
>> would be too involved with not enough code reduction (if any) in the end.
>> 
>> So, although the ideas are similar, the probe logic is separate.
>> 
>> Note that NCQ support for SMR commands (Report Zones, Reset Write Pointer,
>> etc.) for SATA protocol shingled drives isn't active.  For both the da(4)
>> and ada(4) driver this is for lack of a way to plumb the ATA Auxiliary
>> register down to the drive.
>> 
>> In the ada(4) case, we need to add the register to struct ccb_ataio and
>> add support in one or more of the underlying SATA drivers, e.g. ahci(4).
>> 
>> In the da(4) case, it will require an update of the T-10 SAT spec to
>> provide a way to pass the Auxiliary register down via the SCSI ATA
>> PASS-THROUGH command, and then a subsquent update of the SAT layer in
>> various vendors' SAS controller firmware.  At that point, there may be 
>> an official mapping of the SCSI ZBC commands to the ATA ZAC commands, and
>> we may be able to just issue the SCSI version of the commands instead of
>> composing ATA commands in the da(4) driver.  (We'll still need to keep the
>> ATA passthrough version for a while at least to support controllers that
>> don't have the updated translation code.)
>> 
>> FreeBSD/head as of SVN revision 294105:
>> 
>> https://people.freebsd.org/~ken/cam_smr.head.20160118.1.txt
>> 
>> FreeBSD stable/10 as of SVN revision 294100:
>> 
>> https://people.freebsd.org/~ken/cam_smr.stable10.20160118.1.txt
>> 
>> Testing and comments are welcome.
>> 
>> Ken
>> 
>> On Wed, Nov 18, 2015 at 12:13:09 -0500, Kenneth D. Merry wrote:
>>> 
>>> I have work in progress patches to add SMR (Shingled Magnetic Recording)
>>> support to CAM and GEOM here:
>>> 
>>> FreeBSD/head as of SVN revision 290997:
>>> 
>>> https://people.freebsd.org/~ken/cam_smr.head.20151117.1.txt
>>> 
>>> FreeBSD stable/10 as of SVN revision 290995:
>>> 
>>> https://people.freebsd.org/~ken/cam_smr.stable10.20151117.1.txt
>>> 
>>> This includes support for Host Managed, Host Aware and Drive Managed SMR
>>> drives that are either SCSI (ZBC) or ATA (ZAC) attached via a SAS
>>> controller.  This does not include support for SMR ATA drives attched via
>>> an ATA controller.  Also, I have not yet figured out how to properly detect
>>> a Host Managed ATA drive, so this code won't do that.
>>> 
>>> The big drive vendors are moving to SMR for at least some of their drives.
>>> The primary challenge with SMR is that it requires writing a relatively
>>> large zone sequentially starting at the beginning of the zone.  The usual
>>> zone size is 256MB.  It is conceptually almost like having a 256MB sector
>>> size.
>>> 
>>> We (Spectra Logic) are working on ZFS changes that will use this CAM and
>>> GEOM infrastructure to make ZFS play well with SMR drives.  Those changes
>>> aren't yet done.
>>> 
>>> The patches linked above include:
>>> o A new 'camcontrol zone' command that allows displaying and managing
>>>   drive zones via SCSI/ATA passthrough.
>>> o A new zonectl(8) utility that uses the new DIOCZONECMD ioctl to display
>>>   and manage zones via the da(4) (and later ada(4)) driver.
>>> o Changes to diskinfo -v to display the zone mode of a drive.
>>> o A new disk zone API, sys/sys/disk_zone.h.
>>> o A new bio type, BIO_ZONE, and modifications to GEOM to support it.  This
>>>   new bio will allow filesystems to query zone support in a drive and
>>>   manage zoned drives.
>>> o Extensive modifications to the da(4) driver to handle probing SCSI and
>>>   SATA behind SAS SMR drives.
>>> o Additional CAM CDB building functions for zone commands.
>>> 
>>> The current issues that need to be addressed are:
>>> o The da(4) driver now has 6 additional probe states, 5 of which are
>>>   needed for probing ATA drives behind SAS controllers.  I have not yet
>>>   added support for BIO_ZONE bios to ada(4), but it will be very similar
>>>   to the da(4) driver version.  The ATA probe code needs to be pulled
>>>   out of the da(4) driver and changed into a form that will allow it to
>>>   work for either the ada(4) or da(4) driver.  Otherwise we'll have a fair
>>>   amount of code duplication between the two drivers.
>>> 
>>> o There is a reasonable amount of code duplication between 'camcontrol zone'
>>>   and zonectl(8).  This was done for speed / expediency's sake, but it may
>>>   be possible to make more of the code common there.
>>> 
>>> o In order to add the new BIO_ZONE bio command, I had to change the bio_cmd
>>>   field in struct bio from a uint8_t to a uint16_t.  This will cause
>>>   binary compatibility problems with any 3rd party loadable modules.
>>>   Advice on how to handle this would be welcome.
>>> 
>>> o In the process of developing these changes, I discovered that the
>>>   dxfer_len paramter for scsi_ata_pass_16() was too small (uint16_t, and
>>>   it needed to be uint32_t).  I increased it, but that will potentially
>>>   cause a binary incompatibility problem with any existing applications
>>>   that use the current API via libcam.  Advice on how to handle that
>>>   would be welcome.
>>> 
>>> If you look through the code, you'll notice that the disk_zone.h API is
>>> separate from the SCSI and ATA APIs.  The intent is to allow filesystems
>>> and other consumers of the API to just talk to the disk zone API without
>>> dealing with the SCSI and ATA specifics.  Another reason behind all of this
>>> is that even though the SCSI ZBC and ATA ZAC specs were developed in
>>> concert, and are intended to be functionally identical, they are still SCSI
>>> and ATA.  As usual, SCSI is big endian and ATA is little endian.  So to
>>> present a common API to the filesystem, we give all of the zone data back
>>> in native byte order, regardless of the underlying device protocol.
>>> 
>>> Another thing to note is the extensive use of ATA passthrough in the da(4)
>>> driver.  This is necessary because the SCSI SAT (SCSI to ATA Translation)
>>> specification has not yet caught up with translating SCSI zone commands
>>> (ZBC) to ATA zone commands (ZAC).  So, until the spec is updated and LSI
>>> and other vendors update their SCSI to ATA translation layers, we'll have
>>> to use the ATA version of the commands when talking to ATA drives via SAS
>>> controllers.
>>> 
>>> I have only tested the code so far with Seagate SATA Drive Managed and Host
>>> Aware drives.  I would appreciate testing with any drives.  (And testing to
>>> make sure that the patches don't cause problems with existing hardware.)
>>> Right now, all you can really do is manage the zones manually using
>>> camcontrol(8) or zonectl(8).  Automatic management will come with the ZFS
>>> changes.  (Or changes to other filesysems if people want to do it.)
>>> 
>>> If you have a SATA Host Aware drive, in theory camcontrol(8) should allow
>>> you to manage the drive if you have it attached to a SATA controller.
>>> 
>>> Here is an example of some of the commands.
>>> 
>>> Get general zoning information:
>>> 
>>> [root at sm4u-1 ~]# zonectl -c params -d /dev/da21
>>> Zone Mode: Host Aware
>>> Command support: Report Zones, Open, Close, Finish, Reset Write Pointer
>>> Unrestricted Read in Sequential Write Required Zone (URSWRZ): No
>>> Optimal Number of Open Sequential Write Preferred Zones: 128
>>> Optimal Number of Non-Sequentially Written Sequential Write Preferred Zones: 8
>>> Maximum Number of Open Sequential Write Required Zones: Unlimited
>>> 
>>> Look at information from the da(4) driver:
>>> 
>>> [root at sm4u-1 ~]# sysctl kern.cam.da.21
>>> kern.cam.da.21.delete_method: NONE
>>> kern.cam.da.21.delete_max: 1081344
>>> kern.cam.da.21.minimum_cmd_size: 6
>>> kern.cam.da.21.sort_io_queue: -1
>>> kern.cam.da.21.zone_mode: Host Aware
>>> kern.cam.da.21.zone_support: Report Zones, Open, Close, Finish, Reset Write Pointer
>>> kern.cam.da.21.optimal_seq_zones: 128
>>> kern.cam.da.21.optimal_nonseq_zones: 8
>>> kern.cam.da.21.max_seq_zones: 4294967295
>>> kern.cam.da.21.error_inject: 0
>>> 
>>> Display all of the zones with zonectl(8):
>>> 
>>> [root at sm4u-1 ~]# zonectl -d /dev/da21 -c rz
>>> 29809 zones, Maximum LBA 0x3a3812aaf (15628053167)
>>> Zone lengths and types may vary
>>>  Start LBA  Length       WP LBA      Zone Type      Condition      Sequential             Reset
>>>          0, 524288,     0x80000,  Conventional,           NWP,     Sequential,  No Reset Needed
>>>    0x80000, 524288,    0x100000,  Conventional,           NWP,     Sequential,  No Reset Needed
>>>   0x100000, 524288,    0x180000,  Conventional,           NWP,     Sequential,  No Reset Needed
>>>   0x180000, 524288,    0x200000,  Conventional,           NWP,     Sequential,  No Reset Needed
>>>   0x200000, 524288,    0x280000,  Conventional,           NWP,     Sequential,  No Reset Needed
>>>   0x280000, 524288,    0x300000,  Conventional,           NWP,     Sequential,  No Reset Needed
>>>   0x300000, 524288,    0x380000,  Conventional,           NWP,     Sequential,  No Reset Needed
>>>   0x380000, 524288,    0x400000,  Conventional,           NWP,     Sequential,  No Reset Needed
>>>   0x400000, 524288,    0x480000,  Conventional,           NWP,     Sequential,  No Reset Needed
>>>   0x480000, 524288,    0x500000,  Conventional,           NWP,     Sequential,  No Reset Needed
>>>   0x500000, 524288,    0x580000,  Conventional,           NWP,     Sequential,  No Reset Needed
>>>   0x580000, 524288,    0x600000,  Conventional,           NWP,     Sequential,  No Reset Needed
>>>   0x600000, 524288,    0x680000,  Conventional,           NWP,     Sequential,  No Reset Needed
>>>   0x680000, 524288,    0x700000,  Conventional,           NWP,     Sequential,  No Reset Needed
>>>   0x700000, 524288,    0x780000,  Conventional,           NWP,     Sequential,  No Reset Needed
>>> [ ... ]
>>> 0x1f00000, 524288,   0x1f80000,  Conventional,           NWP,     Sequential,  No Reset Needed
>>>  0x1f80000, 524288,   0x2000000,  Conventional,           NWP,     Sequential,  No Reset Needed
>>>  0x2000000, 524288,   0x2080000, Seq Preferred,          Full,     Sequential,  No Reset Needed
>>>  0x2080000, 524288,   0x2100000, Seq Preferred,          Full,     Sequential,  No Reset Needed
>>>  0x2100000, 524288,   0x2180000, Seq Preferred,          Full,     Sequential,  No Reset Needed
>>>  0x2180000, 524288,   0x2200000, Seq Preferred,          Full,     Sequential,  No Reset Needed
>>>  0x2200000, 524288,   0x2280000, Seq Preferred,          Full,     Sequential,  No Reset Needed
>>>  0x2280000, 524288,   0x2300000, Seq Preferred,          Full,     Sequential,  No Reset Needed
>>> [ ... ]
>>> 
>>> Use camcontrol zone to reset the write pointer for the first shingled zone
>>> listed above:
>>> 
>>> [root at sm4u-1 ~]# camcontrol zone da21 -v -c rwp -l 0x2000000
>>> 
>>> Use camcontrol zone to ask the drive to report empty zones:
>>> 
>>> [root at sm4u-1 ~]# camcontrol zone da21 -v -c rz -o empty
>>> 1 zones, Maximum LBA 0x3a3812aaf (15628053167)
>>> Zone lengths and types may vary
>>>  Start LBA  Length       WP LBA      Zone Type      Condition      Sequential             Reset
>>>  0x2000000, 524288,   0x2000000, Seq Preferred,         Empty,     Sequential,  No Reset Needed
>>> 
>>> Get information on a Host Aware drive:
>>> 
>>> root at sm4u-1 ~]# diskinfo -v da21
>>> da21
>>>        512             # sectorsize
>>>        8001563222016   # mediasize in bytes (7.3T)
>>>        15628053168     # mediasize in sectors
>>>        4096            # stripesize
>>>        0               # stripeoffset
>>>        972801          # Cylinders according to firmware.
>>>        255             # Heads according to firmware.
>>>        63              # Sectors according to firmware.
>>>                    Z84008NY    # Disk ident.
>>>        enc at 5003048001f311fd/elmtype at array_device/slot at 22       # Physical path
>>>        Host Aware      # Zone Mode
>>> 
>>> Get information on a drive managed drive:
>>> 
>>> [root at sm4u-1 ~]# diskinfo -v da20
>>> da20
>>>        512             # sectorsize
>>>        8001563222016   # mediasize in bytes (7.3T)
>>>        15628053168     # mediasize in sectors
>>>        4096            # stripesize
>>>        0               # stripeoffset
>>>        972801          # Cylinders according to firmware.
>>>        255             # Heads according to firmware.
>>>        63              # Sectors according to firmware.
>>>                    Z8405938    # Disk ident.
>>>        enc at 5003048001f311fd/elmtype at array_device/slot at 21       # Physical path
>>>        Drive Managed   # Zone Mode
>>> 
>>> Get information on a non-zoned drive:
>>> 
>>> [root at sm4u-1 ~]# diskinfo -v da4
>>> da4
>>>        512             # sectorsize
>>>        100030242816    # mediasize in bytes (93G)
>>>        195371568       # mediasize in sectors
>>>        0               # stripesize
>>>        0               # stripeoffset
>>>        12161           # Cylinders according to firmware.
>>>        255             # Heads according to firmware.
>>>        63              # Sectors according to firmware.
>>>        124903574F36            # Disk ident.
>>>        enc at 5003048001f311fd/elmtype at array_device/slot at 5        # Physical path
>>>        Not Zoned       # Zone Mode
>>> 
>>> Testing and comments are welcome.
>>> 
>>> Thanks,
>>> 
>>> Ken
>>> -- 
>>> Kenneth Merry
>>> ken at FreeBSD.ORG
>> 
>> -- 
>> Kenneth Merry
>> ken at FreeBSD.ORG
> 
> -- 
> Kenneth Merry
> ken at FreeBSD.ORG
> _______________________________________________
> freebsd-scsi at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"



More information about the freebsd-scsi mailing list