possible memory corruption prior to r268075 [Was: r268075 (MFV r267565) undone FreeBSD-specific changes]
Matthew Ahrens
mahrens at delphix.com
Mon Nov 3 00:44:48 UTC 2014
On Wed, Sep 17, 2014 at 9:33 AM, Justin T. Gibbs <gibbs at freebsd.org> wrote:
> On Sep 17, 2014, at 3:28 AM, Andriy Gapon <avg at FreeBSD.org> wrote:
>
> >
> > While looking at the changes discussed in the previous email another
> issue has
> > caught my attention.
> > Both l2arc_compress_buf() and zio_write_bp_init() allocate a buffer for
> > compressed data using an original / logical buffer size for the size.
> The
> > buffer allocation is done using zio_buf_alloc() and
> zio_data_buf_alloc(). So,
> > the size would be increased to one of the standard buffer sizes if
> needed. The
> > standard buffer sizes are: 512B - 4KB with a step of 512B, 4KB - 8KB
> with a step
> > of 1KB, 8KB - 16KB with a step of 2KB, 16KB - 128KB with a step of 4KB.
> > The size for compressed data should always be sufficient because the
> compressed
> > data is supposed to never be larger than the original data even taking
> into
> > account rounding up to a multiple of 512B. In other words, psize <=
> lsize.
> >
> > But in FreeBSD prior to commit r268075 we used to round up the
> compressed size
> > to a multiple of vdev sector size.
>
> There is no data corruption due to the earlier code that truncates d_len
> such that at least a 12.5% compression can be achieved in a minblocksize
> sized/align buffer:
>
> /* Compress at least 12.5% */
> d_len = P2ALIGN(s_len - (s_len >> 3), minblocksize);
> if (d_len == 0) {
> ZCOMPSTAT_BUMP(zcompstat_skipped_minblocksize);
> return (s_len);
> }
>
> d_len is strictly less than s_len, and c_len is tested to be <= d_len, so
> the roundup that follows later cannot exceed the original buffer size
> (s_len).
>
> r268075 lost this optimization to avoid attempting to compress something
> that can never achieve even a 12.5% improvement due to minimum ashift.
> This was done to support data embedded in block pointers. However, I have
> no experience with how likely we are to compress 4K of data down to 112
> bytes (a 98% compression ratio).
Actually, there are several use cases where even 8K blocks can become
embedded. For example, initialized but otherwise unused blocks of Oracle
database files (probably other databases too). These blocks have a small
header and footer on each block but are otherwise zero-filled.
--matt
> Perhaps it’s quite likely for metadata, but not so likely for user data?
> If so, we should restore the old check for user data blocks to avoid
> useless consumption of CPU.
>
> > I think that in general our code had and may still has a bit of
> confusion about
> > psize (possibly compressed data size) and asize (actual disk allocation
> size).
> > I think that psize should always be not greater than lsize. While asize
> is
> > always not less than psize.
> > psize <= lsize
> > psize <= asize
> > Obviously, there is no rule on the relation between lsize and asize.
> > But our code used to force ashift on psize which made it kind of asize,
> or
> > something in between the classic psize and asize.
>
> Where did we force this? You mean in the 12.5% compression test? Or the
> roundup/zero fill during compression?
>
> —
> Justin
> _______________________________________________
> zfs-devel at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/zfs-devel
> To unsubscribe, send any mail to "zfs-devel-unsubscribe at freebsd.org"
>
More information about the zfs-devel
mailing list