[zfs] Updated ashift optimization patch

Prakash Surya surya1 at llnl.gov
Thu Aug 22 15:06:29 UTC 2013


Thanks. The github repo is easier for me to work with, so I pulled from
there. I pushed a pull request to ZFS on Linux:

    https://github.com/zfsonlinux/zfs/pull/1671

The patch applied relatively clean. The main changes I made for the port
were removing the GEOM and SYSCTL diffs. The SYSCTL changes should
probably be moved to a module parameter for Linux, but I haven't done
that yet.

-- 
Cheers, Prakash

On Wed, Aug 21, 2013 at 01:30:16PM -0600, Justin T. Gibbs wrote:
> The revision was included in my previous email, but you can pull directly from the source if you'd like.
> 
> FreeBSD's master repo is in subversion at svn://svn.freebsd.org/base.  You'll want the "head" branch.
> 
> The git mirror repo at git://github.com/freebsd/freebsd.git is unofficial but seems to work.
> 
> --
> Justin
> 
> On Aug 21, 2013, at 1:18 PM, Prakash Surya <surya1 at llnl.gov> wrote:
> 
> > Sure, where is the upstream repo for FreeBSD kept?
> > 
> > -- 
> > Cheers, Prakash
> > 
> > On Wed, Aug 21, 2013 at 01:06:30PM -0600, Justin T. Gibbs wrote:
> >> Since you are porting the changes, please use the final version of the change that I committed to FreeBSD.  I think I fixed a few issues in some comments since the last version I posted.
> >> 
> >> --
> >> Justin
> >> 
> > 
> > 
> >> 
> >> 
> >> On Aug 21, 2013, at 12:55 PM, Justin T. Gibbs <gibbs at freebsd.org> wrote:
> >> 
> >>> Hmm.  That does look a bit messy.  vdev_disk.c is not included in FreeBSD's userland build, so that op vector is left undefined in libzpool.  But, since those ops are never referenced from userland, it works.
> >>> 
> >>> --
> >>> Justin
> >>> 
> >>> On Aug 21, 2013, at 12:41 PM, Richard Yao <ryao at gentoo.org> wrote:
> >>> 
> >>>> What happens to the reference to vdev_disk_ops in
> >>>> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c on FreeBSD? Is it
> >>>> simply left undefined?
> >>>> 
> >>>> /*
> >>>> * Virtual device management.
> >>>> */
> >>>> 
> >>>> static vdev_ops_t *vdev_ops_table[] = {
> >>>>      &vdev_root_ops,
> >>>>      &vdev_raidz_ops,
> >>>>      &vdev_mirror_ops,
> >>>>      &vdev_replacing_ops,
> >>>>      &vdev_spare_ops,
> >>>> #ifdef _KERNEL
> >>>>      &vdev_geom_ops,
> >>>> #else
> >>>>      &vdev_disk_ops,
> >>>> #endif
> >>>>      &vdev_file_ops,
> >>>>      &vdev_missing_ops,
> >>>>      &vdev_hole_ops,
> >>>>      NULL
> >>>> };
> >>>> 
> >>>> On 08/21/2013 02:23 PM, Justin T. Gibbs wrote:
> >>>>> FreeBSD doesn't use vdev_disk.c, so I haven't updated it.
> >>>>> 
> >>>>> --
> >>>>> Justin
> >>>>> 
> >>>>> On Aug 21, 2013, at 10:51 AM, Prakash Surya <surya1 at llnl.gov> wrote:
> >>>>> 
> >>>>>> I'm looking at porting the attached patch to ZFS, and I have a question.
> >>>>>> 
> >>>>>> The patch modifies the typedef of vdev_open_func_t:
> >>>>>> 
> >>>>>>  typedef int>---vdev_open_func_t(vdev_t *vd, uint64_t *size, uint64_t *max_size,
> >>>>>> -    uint64_t *ashift);
> >>>>>> +    uint64_t *logical_ashift, uint64_t *physical_ashift);
> >>>>>> 
> >>>>>> But it doesn't modify the prototype for vdev_disk_open like it does for
> >>>>>> the other vdev_*_open calls (i.e. vdev_file_open, vdev_mirror_open,
> >>>>>> etc). Is that oversight, or am I misunderstanding something? I see the
> >>>>>> vdev_disk_ops structure in the illumos-gate source code, so the
> >>>>>> structure isn't Linux specific..
> >>>>>> 
> >>>>>> -- 
> >>>>>> Cheers, Prakash
> >>>>>> 
> >>>>>> On Tue, Aug 20, 2013 at 04:10:43PM -0600, Justin T. Gibbs wrote:
> >>>>>>> Unless I hear otherwise, I plan to commit this patch (with the EINVAL change described below) as soon as my testing on FreeBSD/head completes.
> >>>>>>> 
> >>>>>>> --
> >>>>>>> Justin
> >>>>>>> 
> >>>>>>> On Aug 18, 2013, at 8:26 PM, Justin T. Gibbs <gibbs at freebsd.org> wrote:
> >>>>>>> 
> >>>>>>>> On Aug 18, 2013, at 6:23 PM, "Steven Hartland" <killing at multiplay.co.uk> wrote:
> >>>>>>>> 
> >>>>>>>>> Few things I've noticed:-
> >>>>>>>>> 
> >>>>>>>>> 1. in vdev_geom.c is there a reason you use:
> >>>>>>>>> + *physical_ashift = 0;
> >>>>>>>>> + if (pp->stripesize)
> >>>>>>>>> +  *physical_ashift = highbit(pp->stripesize) - 1;
> >>>>>>>>> 
> >>>>>>>>> Instead of:
> >>>>>>>>> + *physical_ashift = highbit(MAX(pp->stripesize, SPA_MINBLOCKSIZE)) - 1;
> >>>>>>>> 
> >>>>>>>> 0 in ashift as in stripesize means "unset" or "don't care".  This gives more information during debugging ("Oh.  The driver for this device must not be setting this.") than if it were clamped.  If we decide it should always be set, I think "MAX(pp->sectorsize, pp->stripesize)" would make more sense.
> >>>>>>>> 
> >>>>>>>>> 2. sysctl_vfs_zfs_max_auto_ashift should also limit the min to >
> >>>>>>>>> SPA_MINBLOCKSHIFT
> >>>>>>>> 
> >>>>>>>> Setting it too low (any value less than the device's logical ashift) just disables the feature.  I don't see any value in adding code to error out or clamp considering the most likely reason for this to occur is an administrator trying to turn it off (e.g. set it to 0).
> >>>>>>>> 
> >>>>>>>>> 3. sysctl_vfs_zfs_max_auto_ashift should error if the value is out
> >>>>>>>>> of range instead of clamping it.
> >>>>>>>> 
> >>>>>>>> Ok.  I now return EINVAL if the value exceeds SPA_MAXASHIFT.
> >>>>>>>> 
> >>>>>>>>> One thing we did here, which would be good to see in this patch, is to add
> >>>>>>>>> an overall min system ashift as thie enables admins to configure pools
> >>>>>>>>> to be compatible with future disks they are likely to use e.g. min ashift
> >>>>>>>>> 12 (4k compatible). This could be left at 9 by default for max compatibility
> >>>>>>>>> but personally I'd suggest 12.
> >>>>>>>> 
> >>>>>>>> it would be nice to hear more consensus come out of the recent zpool ashift discussions before doing something here.  Whatever that is, I agree that the default should be 9 until someone fixes the RAIDZ space penalty for going to 4k on 512N drives.
> >>>>>>>> 
> >>>>>>>> --
> >>>>>>>> Justin
> >>>>>>>> _______________________________________________
> >>>>>>>> zfs-devel at freebsd.org mailing list
> >>>>>>>> http://lists.freebsd.org/mailman/listinfo/zfs-devel
> >>>>>>>> To unsubscribe, send any mail to "zfs-devel-unsubscribe at freebsd.org"
> >>>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> -------------------------------------------
> >>>>>>> illumos-zfs
> >>>>>>> Archives: https://www.listbox.com/member/archive/182191/=now
> >>>>>>> RSS Feed: https://www.listbox.com/member/archive/rss/182191/23963346-4bb55813
> >>>>>>> Modify Your Subscription: https://www.listbox.com/member/?&
> >>>>>>> Powered by Listbox: http://www.listbox.com
> >>>>>> _______________________________________________
> >>>>>> zfs-devel at freebsd.org mailing list
> >>>>>> http://lists.freebsd.org/mailman/listinfo/zfs-devel
> >>>>>> To unsubscribe, send any mail to "zfs-devel-unsubscribe at freebsd.org"
> >>>>>> 
> >>>>> 
> >>>> 
> >>>> 
> >>> 
> >> 
> > 
> > 
> > 
> > 
> > 
> > -------------------------------------------
> > illumos-zfs
> > Archives: https://www.listbox.com/member/archive/182191/=now
> > RSS Feed: https://www.listbox.com/member/archive/rss/182191/23052111-5da6ea43
> > Modify Your Subscription: https://www.listbox.com/member/?&
> > Powered by Listbox: http://www.listbox.com
> > 
> 
> 
> 
> -------------------------------------------
> illumos-zfs
> Archives: https://www.listbox.com/member/archive/182191/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/182191/23963346-4bb55813
> Modify Your Subscription: https://www.listbox.com/member/?member_id=23963346&id_secret=23963346-89c22f02
> Powered by Listbox: http://www.listbox.com


More information about the zfs-devel mailing list