possible memory corruption prior to r268075 [Was: r268075 (MFV r267565) undone FreeBSD-specific changes]

Andriy Gapon avg at FreeBSD.org
Thu Sep 18 08:44:04 UTC 2014


On 17/09/2014 19:33, Justin T. Gibbs wrote:
> On Sep 17, 2014, at 3:28 AM, Andriy Gapon <avg at FreeBSD.org> wrote:
> 
>>
>> While looking at the changes discussed in the previous email another issue has
>> caught my attention.
>> Both l2arc_compress_buf() and zio_write_bp_init() allocate a buffer for
>> compressed data using an original / logical buffer size for the size.  The
>> buffer allocation is done using zio_buf_alloc() and zio_data_buf_alloc().  So,
>> the size would be increased to one of the standard buffer sizes if needed.  The
>> standard buffer sizes are: 512B - 4KB with a step of 512B, 4KB - 8KB with a step
>> of 1KB, 8KB - 16KB with a step of 2KB, 16KB - 128KB with a step of 4KB.
>> The size for compressed data should always be sufficient because the compressed
>> data is supposed to never be larger than the original data even taking into
>> account rounding up to a multiple of 512B.  In other words, psize <= lsize.
>>
>> But in FreeBSD prior to commit r268075 we used to round up the compressed size
>> to a multiple of vdev sector size.
> 
> There is no data corruption due to the earlier code that truncates d_len such that at least a 12.5% compression can be achieved in a minblocksize sized/align buffer:
> 
> 	/* Compress at least 12.5% */
> 	d_len = P2ALIGN(s_len - (s_len >> 3), minblocksize);

When I looked at this line I saw the 12.5% thing but somehow I completely failed
to see P2ALIGN().  Hence my confusion.  Thank you for correcting me.

> 	if (d_len == 0) {
> 		ZCOMPSTAT_BUMP(zcompstat_skipped_minblocksize);
> 		return (s_len);
> 	}
> 
> d_len is strictly less than s_len, and c_len is tested to be <= d_len, so the roundup that follows later cannot exceed the original buffer size (s_len).
> 
> r268075 lost this optimization to avoid attempting to compress something that can never achieve even a 12.5% improvement due to minimum ashift.  This was done to support data embedded in block pointers.  However, I have no experience with how likely we are to compress 4K of data down to 112 bytes (a 98% compression ratio).  Perhaps it’s quite likely for metadata, but not so likely for user data?  If so, we should restore the old check for user data blocks to avoid useless consumption of CPU.

Yes.  So now the code that does rounding up after zio_compress_data() needs to
be careful not to go beyond the buffer end.

Xin, I think it will not be sufficient to simply replace SPA_MINBLOCKSIZE with 1
<< l2hdr->b_dev->l2ad_vdev->vdev_ashift in l2arc_compress_buf().

Perhaps something like the following:
@@ -5276,12 +5286,6 @@ l2arc_compress_buf(l2arc_buf_hdr_t *l2hdr)
        csize = zio_compress_data(ZIO_COMPRESS_LZ4, l2hdr->b_tmp_cdata,
            cdata, l2hdr->b_asize);

-       rounded = P2ROUNDUP(csize, (size_t)SPA_MINBLOCKSIZE);
-       if (rounded > csize) {
-               bzero((char *)cdata + csize, rounded - csize);
-               csize = rounded;
-       }
-
        if (csize == 0) {
                /* zero block, indicate that there's nothing to write */
                zio_data_buf_free(cdata, len);
@@ -5290,11 +5294,19 @@ l2arc_compress_buf(l2arc_buf_hdr_t *l2hdr)
                l2hdr->b_tmp_cdata = NULL;
                ARCSTAT_BUMP(arcstat_l2_compress_zeros);
                return (B_TRUE);
-       } else if (csize > 0 && csize < len) {
+       }
+
+       rounded = P2ROUNDUP(csize,
+           (size_t)1 << l2hdr->b_dev->l2ad_vdev->vdev_ashift);
+       if (rounded < len) {
                /*
                 * Compression succeeded, we'll keep the cdata around for
                 * writing and release it afterwards.
                 */
+               if (rounded > csize) {
+                       bzero((char *)cdata + csize, rounded - csize);
+                       csize = rounded;
+               }
                l2hdr->b_compress = ZIO_COMPRESS_LZ4;
                l2hdr->b_asize = csize;
                l2hdr->b_tmp_cdata = cdata;

The idea is to use the compressed data only if its rounded up size is smaller
than the original size.

>> I think that in general our code had and may still has a bit of confusion about
>> psize (possibly compressed data size) and asize (actual disk allocation size).
>> I think that psize should always be not greater than lsize.  While asize is
>> always not less than psize.
>>  psize <= lsize
>>  psize <= asize
>> Obviously, there is no rule on the relation between lsize and asize.
>> But our code used to force ashift on psize which made it kind of asize, or
>> something in between the classic psize and asize.
> 
> Where did we force this?  You mean in the 12.5% compression test?  Or the roundup/zero fill during compression?

I meant roundup with minblocksize when zio_compress_data() was called in
zio_write_bp_init().  In that case psize didn't have to be a multiple of
minblocksize.  But on the other hand I think that no harm was done.

-- 
Andriy Gapon


More information about the zfs-devel mailing list