svn commit: r256132 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs

Tue Nov 5 19:55:21 UTC 2013

Andriy Gapon wrote this message on Tue, Nov 05, 2013 at 12:12 +0200:
> on 08/10/2013 04:38 Xin LI said the following:
> > Author: delphij
> > Date: Tue Oct  8 01:38:24 2013
> > New Revision: 256132
> > URL: http://svnweb.freebsd.org/changeset/base/256132
> > 
> > Log:
> >   Improve lzjb decompress performance by reorganizing the code
> >   to tighten the copy loop.
> >   
> >   Submitted by:	Denis Ahrens <denis h3q com>
> >   MFC after:	2 weeks
> >   Approved by:	re (gjb)
> > 
> > Modified:
> >   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lzjb.c
> > 
> > Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lzjb.c
> > ==============================================================================
> > --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lzjb.c	Mon Oct  7 22:30:03 2013	(r256131)
> > +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lzjb.c	Tue Oct  8 01:38:24 2013	(r256132)
> > @@ -117,7 +117,9 @@ lzjb_decompress(void *s_start, void *d_s
> >  			src += 2;
> >  			if ((cpy = dst - offset) < (uchar_t *)d_start)
> >  				return (-1);
> > -			while (--mlen >= 0 && dst < d_end)
> > +			if (mlen > (d_end - dst))
> > +				mlen = d_end - dst;
> > +			while (--mlen >= 0)
> >  				*dst++ = *cpy++;
> >  		} else {
> >  			*dst++ = *src++;
> > 
> 
> Intuitively it might seem that this change is indeed an improvement.
> But given how "not dumb" (not sure if that always amounts to smart) the modern
> compilers are, has anyone actually measured if this change indeed improves the
> performance?
> 
> Judging from the conversations on the ZFS mailing lists this change event hurt
> performance in some environments.
> Looks like the upstream is not going to take this change.
> 
> So does it make any sense for us to make the code more obscure and different
> from upstream?  Unless performance benefits could indeed be demonstrated.

The old code compiles to:
   62c13:       49 f7 de                neg    %r14
   62c16:       44 8d 58 ff             lea    -0x1(%rax),%r11d
   62c1a:       4c 89 d3                mov    %r10,%rbx
   62c1d:       0f 1f 00                nopl   (%rax)
   62c20:       42 8a 14 33             mov    (%rbx,%r14,1),%dl
   62c24:       88 13                   mov    %dl,(%rbx)
   62c26:       48 ff c3                inc    %rbx
   62c29:       ff c8                   dec    %eax
   62c2b:       85 c0                   test   %eax,%eax
   62c2d:       7f f1                   jg     62c20 <lzjb_decompress+0xa0>

The load/store line is addresses 62c20-62c29...  So it looks like clang
is already doing the optimization that was added...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."