svn commit: r244087 - in stable/9: cddl/contrib/opensolaris/cmd/zfs cddl/contrib/opensolaris/cmd/zpool cddl/contrib/opensolaris/cmd/ztest sys/cddl/contrib/opensolaris/common/zfs sys/cddl/contrib/op...

Martin Matuska mm at FreeBSD.org
Mon Dec 10 14:36:53 UTC 2012


Author: mm
Date: Mon Dec 10 14:36:48 2012
New Revision: 244087
URL: http://svnweb.freebsd.org/changeset/base/244087

Log:
  MFC recent ZFS changes from illumos:
  243503, 243524, 243525, 243560, 243561
  
  MFC r243503:
  Illumos 13879:4eac7a87eff2
  3329 spa_sync() spends 10-20% of its time in spa_free_sync_cb()
  3330 space_seg_t should have its own kmem_cache
  3331 deferred frees should happen after sync_pass 1
  3335 make SYNC_PASS_* constants tunable
  
  New loader-only tunables:
  vfs.zfs.sync_pass_deferred_free
  vfs.zfs.sync_pass_dont_compress
  vfs.zfs.sync_pass_rewrite
  
  References:
  https://www.illumos.org/issues/3329
  https://www.illumos.org/issues/3330
  https://www.illumos.org/issues/3331
  https://www.illumos.org/issues/3335
  
  MFC r243524:
  Import the zio nop-write improvement from Illumos. To reduce I/O,
  nop-write omits overwriting data if the checksum (cryptographically
  secure) of new data matches the checksum of existing data.
  It also saves space if snapshots are in use.
  
  It currently works only on datasets with enabled compression, disabled
  deduplication and sha256 checksums.
  
  IllumOS 13887:196932ec9e6a and 13888:7204b3392a58
  3236 zio nop-write
  
  References:
  https://www.illumos.org/issues/3236
  
  MFC r243525:
  Add loader(8) tunable to enable/disable nopwrite functionality:
  vfs.zfs.nopwrite_enabled
  
  MFC r243560:
  Introduce a new dataset aclmode setting "restricted" to protect ACL's
  being destroyed or corrupted by a drive-by chmod.
  
  illumos-gate 13889:a67716f16746
  3254 add support in zfs for aclmode=restricted
  
  MFC r243561:
  Update manpage dates in zfs.8 and zpool.8

Modified:
  stable/9/cddl/contrib/opensolaris/cmd/zfs/zfs.8
  stable/9/cddl/contrib/opensolaris/cmd/zpool/zpool.8
  stable/9/cddl/contrib/opensolaris/cmd/ztest/ztest.c
  stable/9/sys/cddl/contrib/opensolaris/common/zfs/zfs_prop.c
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab_impl.h
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/space_map.h
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_impl.h
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
  stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
Directory Properties:
  stable/9/cddl/contrib/opensolaris/   (props changed)
  stable/9/cddl/contrib/opensolaris/cmd/zfs/   (props changed)
  stable/9/sys/   (props changed)
  stable/9/sys/cddl/contrib/opensolaris/   (props changed)

Modified: stable/9/cddl/contrib/opensolaris/cmd/zfs/zfs.8
==============================================================================
--- stable/9/cddl/contrib/opensolaris/cmd/zfs/zfs.8	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/cddl/contrib/opensolaris/cmd/zfs/zfs.8	Mon Dec 10 14:36:48 2012	(r244087)
@@ -27,7 +27,7 @@
 .\"
 .\" $FreeBSD$
 .\"
-.Dd September 5, 2012
+.Dd November 26, 2012
 .Dt ZFS 8
 .Os
 .Sh NAME
@@ -754,7 +754,7 @@ If no inheritable
 .Tn ACE Ns s
 exist that affect the mode, then the mode is set in accordance to the requested
 mode from the application.
-.It Sy aclmode Ns = Ns Cm discard | groupmask | passthrough
+.It Sy aclmode Ns = Ns Cm discard | groupmask | passthrough | restricted
 Controls how an
 .Tn ACL
 is modified during
@@ -784,6 +784,32 @@ indicates that no changes are made to th
 other than creating or updating the necessary
 .Tn ACL
 entries to represent the new mode of the file or directory.
+An
+.Sy aclmode
+property of
+.Cm restricted
+will cause the
+.Xr chmod 2
+operation to return an error when used on any file or directory which has
+a non-trivial
+.Tn ACL
+whose entries can not be represented by a mode.
+.Xr chmod 2
+is required to change the set user ID, set group ID, or sticky bits on a file
+or directory, as they do not have equivalent
+.Tn ACL
+entries.
+In order to use
+.Xr chmod 2
+on a file or directory with a non-trivial
+.Tn ACL
+when
+.Sy aclmode
+is set to
+.Cm restricted ,
+you must first remove all
+.Tn ACL
+entries which do not represent the current mode.
 .It Sy atime Ns = Ns Cm on | off
 Controls whether the access time for files is updated when they are read.
 Turning this property off avoids producing write traffic when reading files and

Modified: stable/9/cddl/contrib/opensolaris/cmd/zpool/zpool.8
==============================================================================
--- stable/9/cddl/contrib/opensolaris/cmd/zpool/zpool.8	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/cddl/contrib/opensolaris/cmd/zpool/zpool.8	Mon Dec 10 14:36:48 2012	(r244087)
@@ -25,7 +25,7 @@
 .\"
 .\" $FreeBSD$
 .\"
-.Dd November 28, 2011
+.Dd November 15, 2012
 .Dt ZPOOL 8
 .Os
 .Sh NAME

Modified: stable/9/cddl/contrib/opensolaris/cmd/ztest/ztest.c
==============================================================================
--- stable/9/cddl/contrib/opensolaris/cmd/ztest/ztest.c	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/cddl/contrib/opensolaris/cmd/ztest/ztest.c	Mon Dec 10 14:36:48 2012	(r244087)
@@ -204,6 +204,7 @@ enum ztest_io_type {
 	ZTEST_IO_WRITE_ZEROES,
 	ZTEST_IO_TRUNCATE,
 	ZTEST_IO_SETATTR,
+	ZTEST_IO_REWRITE,
 	ZTEST_IO_TYPES
 };
 
@@ -1859,6 +1860,12 @@ ztest_get_data(void *arg, lr_write_t *lr
 		    DMU_READ_NO_PREFETCH);
 
 		if (error == 0) {
+			blkptr_t *obp = dmu_buf_get_blkptr(db);
+			if (obp) {
+				ASSERT(BP_IS_HOLE(bp));
+				*bp = *obp;
+			}
+
 			zgd->zgd_db = db;
 			zgd->zgd_bp = bp;
 
@@ -2004,6 +2011,9 @@ ztest_remove(ztest_ds_t *zd, ztest_od_t 
 			continue;
 		}
 
+		/*
+		 * No object was found.
+		 */
 		if (od->od_object == 0)
 			continue;
 
@@ -2119,6 +2129,7 @@ ztest_prealloc(ztest_ds_t *zd, uint64_t 
 static void
 ztest_io(ztest_ds_t *zd, uint64_t object, uint64_t offset)
 {
+	int err;
 	ztest_block_tag_t wbt;
 	dmu_object_info_t doi;
 	enum ztest_io_type io_type;
@@ -2171,6 +2182,25 @@ ztest_io(ztest_ds_t *zd, uint64_t object
 	case ZTEST_IO_SETATTR:
 		(void) ztest_setattr(zd, object);
 		break;
+
+	case ZTEST_IO_REWRITE:
+		(void) rw_rdlock(&ztest_name_lock);
+		err = ztest_dsl_prop_set_uint64(zd->zd_name,
+		    ZFS_PROP_CHECKSUM, spa_dedup_checksum(ztest_spa),
+		    B_FALSE);
+		VERIFY(err == 0 || err == ENOSPC);
+		err = ztest_dsl_prop_set_uint64(zd->zd_name,
+		    ZFS_PROP_COMPRESSION,
+		    ztest_random_dsl_prop(ZFS_PROP_COMPRESSION),
+		    B_FALSE);
+		VERIFY(err == 0 || err == ENOSPC);
+		(void) rw_unlock(&ztest_name_lock);
+
+		VERIFY0(dmu_read(zd->zd_os, object, offset, blocksize, data,
+		    DMU_READ_NO_PREFETCH));
+
+		(void) ztest_write(zd, object, offset, blocksize, data);
+		break;
 	}
 
 	(void) rw_unlock(&zd->zd_zilog_lock);
@@ -2258,7 +2288,12 @@ ztest_zil_remount(ztest_ds_t *zd, uint64
 {
 	objset_t *os = zd->zd_os;
 
-	VERIFY(mutex_lock(&zd->zd_dirobj_lock) == 0);
+	/*
+	 * We grab the zd_dirobj_lock to ensure that no other thread is
+	 * updating the zil (i.e. adding in-memory log records) and the
+	 * zd_zilog_lock to block any I/O.
+	 */
+	VERIFY0(mutex_lock(&zd->zd_dirobj_lock));
 	(void) rw_wrlock(&zd->zd_zilog_lock);
 
 	/* zfsvfs_teardown() */
@@ -4906,8 +4941,8 @@ ztest_ddt_repair(ztest_ds_t *zd, uint64_
 	 */
 	for (int i = 0; i < copies; i++) {
 		uint64_t offset = i * blocksize;
-		VERIFY(dmu_buf_hold(os, object, offset, FTAG, &db,
-		    DMU_READ_NO_PREFETCH) == 0);
+		VERIFY0(dmu_buf_hold(os, object, offset, FTAG, &db,
+		    DMU_READ_NO_PREFETCH));
 		ASSERT(db->db_offset == offset);
 		ASSERT(db->db_size == blocksize);
 		ASSERT(ztest_pattern_match(db->db_data, db->db_size, pattern) ||
@@ -4923,8 +4958,8 @@ ztest_ddt_repair(ztest_ds_t *zd, uint64_
 	/*
 	 * Find out what block we got.
 	 */
-	VERIFY(dmu_buf_hold(os, object, 0, FTAG, &db,
-	    DMU_READ_NO_PREFETCH) == 0);
+	VERIFY0(dmu_buf_hold(os, object, 0, FTAG, &db,
+	    DMU_READ_NO_PREFETCH));
 	blk = *((dmu_buf_impl_t *)db)->db_blkptr;
 	dmu_buf_rele(db, FTAG);
 
@@ -5602,6 +5637,8 @@ ztest_freeze(void)
 	kernel_init(FREAD | FWRITE);
 	VERIFY3U(0, ==, spa_open(ztest_opts.zo_pool, &spa, FTAG));
 	VERIFY3U(0, ==, ztest_dataset_open(0));
+	spa->spa_debug = B_TRUE;
+	ztest_spa = spa;
 
 	/*
 	 * Force the first log block to be transactionally allocated.

Modified: stable/9/sys/cddl/contrib/opensolaris/common/zfs/zfs_prop.c
==============================================================================
--- stable/9/sys/cddl/contrib/opensolaris/common/zfs/zfs_prop.c	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/sys/cddl/contrib/opensolaris/common/zfs/zfs_prop.c	Mon Dec 10 14:36:48 2012	(r244087)
@@ -109,6 +109,7 @@ zfs_prop_init(void)
 		{ "discard",	ZFS_ACL_DISCARD },
 		{ "groupmask",	ZFS_ACL_GROUPMASK },
 		{ "passthrough", ZFS_ACL_PASSTHROUGH },
+		{ "restricted", ZFS_ACL_RESTRICTED },
 		{ NULL }
 	};
 
@@ -217,7 +218,8 @@ zfs_prop_init(void)
 	    "hidden | visible", "SNAPDIR", snapdir_table);
 	zprop_register_index(ZFS_PROP_ACLMODE, "aclmode", ZFS_ACL_DISCARD,
 	    PROP_INHERIT, ZFS_TYPE_FILESYSTEM,
-	    "discard | groupmask | passthrough", "ACLMODE", acl_mode_table);
+	    "discard | groupmask | passthrough | restricted", "ACLMODE",
+	    acl_mode_table);
 	zprop_register_index(ZFS_PROP_ACLINHERIT, "aclinherit",
 	    ZFS_ACL_RESTRICTED, PROP_INHERIT, ZFS_TYPE_FILESYSTEM,
 	    "discard | noallow | restricted | passthrough | passthrough-x",

Modified: stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
==============================================================================
--- stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c	Mon Dec 10 14:36:48 2012	(r244087)
@@ -3606,6 +3606,12 @@ arc_write_done(zio_t *zio)
 				arc_hdr_destroy(exists);
 				exists = buf_hash_insert(hdr, &hash_lock);
 				ASSERT3P(exists, ==, NULL);
+			} else if (zio->io_flags & ZIO_FLAG_NOPWRITE) {
+				/* nopwrite */
+				ASSERT(zio->io_prop.zp_nopwrite);
+				if (!BP_EQUAL(&zio->io_bp_orig, zio->io_bp))
+					panic("bad nopwrite, hdr=%p exists=%p",
+					    (void *)hdr, (void *)exists);
 			} else {
 				/* Dedup */
 				ASSERT(hdr->b_datacnt == 1);

Modified: stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c
==============================================================================
--- stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c	Mon Dec 10 14:36:48 2012	(r244087)
@@ -768,13 +768,15 @@ dbuf_unoverride(dbuf_dirty_record_t *dr)
 	ASSERT(db->db_data_pending != dr);
 
 	/* free this block */
-	if (!BP_IS_HOLE(bp)) {
+	if (!BP_IS_HOLE(bp) && !dr->dt.dl.dr_nopwrite) {
 		spa_t *spa;
 
 		DB_GET_SPA(&spa, db);
 		zio_free(spa, txg, bp);
 	}
 	dr->dt.dl.dr_override_state = DR_NOT_OVERRIDDEN;
+	dr->dt.dl.dr_nopwrite = B_FALSE;
+
 	/*
 	 * Release the already-written buffer, so we leave it in
 	 * a consistent dirty state.  Note that all callers are
@@ -2172,6 +2174,13 @@ dmu_buf_freeable(dmu_buf_t *dbuf)
 	return (res);
 }
 
+blkptr_t *
+dmu_buf_get_blkptr(dmu_buf_t *db)
+{
+	dmu_buf_impl_t *dbi = (dmu_buf_impl_t *)db;
+	return (dbi->db_blkptr);
+}
+
 static void
 dbuf_check_blkptr(dnode_t *dn, dmu_buf_impl_t *db)
 {
@@ -2514,7 +2523,11 @@ dbuf_write_done(zio_t *zio, arc_buf_t *b
 	ASSERT0(zio->io_error);
 	ASSERT(db->db_blkptr == bp);
 
-	if (zio->io_flags & ZIO_FLAG_IO_REWRITE) {
+	/*
+	 * For nopwrites and rewrites we ensure that the bp matches our
+	 * original and bypass all the accounting.
+	 */
+	if (zio->io_flags & (ZIO_FLAG_IO_REWRITE | ZIO_FLAG_NOPWRITE)) {
 		ASSERT(BP_EQUAL(bp, bp_orig));
 	} else {
 		objset_t *os;
@@ -2705,7 +2718,7 @@ dbuf_write(dbuf_dirty_record_t *dr, arc_
 		mutex_enter(&db->db_mtx);
 		dr->dt.dl.dr_override_state = DR_NOT_OVERRIDDEN;
 		zio_write_override(dr->dr_zio, &dr->dt.dl.dr_overridden_by,
-		    dr->dt.dl.dr_copies);
+		    dr->dt.dl.dr_copies, dr->dt.dl.dr_nopwrite);
 		mutex_exit(&db->db_mtx);
 	} else if (db->db_state == DB_NOFILL) {
 		ASSERT(zp.zp_checksum == ZIO_CHECKSUM_OFF);

Modified: stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c
==============================================================================
--- stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c	Mon Dec 10 14:36:48 2012	(r244087)
@@ -40,11 +40,21 @@
 #include <sys/zfs_ioctl.h>
 #include <sys/zap.h>
 #include <sys/zio_checksum.h>
+#include <sys/zio_compress.h>
 #include <sys/sa.h>
 #ifdef _KERNEL
 #include <sys/zfs_znode.h>
 #endif
 
+/*
+ * Enable/disable nopwrite feature.
+ */
+int zfs_nopwrite_enabled = 1;
+SYSCTL_DECL(_vfs_zfs);
+TUNABLE_INT("vfs.zfs.nopwrite_enabled", &zfs_nopwrite_enabled);
+SYSCTL_INT(_vfs_zfs, OID_AUTO, nopwrite_enabled, CTLFLAG_RDTUN,
+    &zfs_nopwrite_enabled, 0, "Enable nopwrite feature");
+
 const dmu_object_type_info_t dmu_ot[DMU_OT_NUMTYPES] = {
 	{	DMU_BSWAP_UINT8,	TRUE,	"unallocated"		},
 	{	DMU_BSWAP_ZAP,		TRUE,	"object directory"	},
@@ -1287,6 +1297,16 @@ dmu_sync_done(zio_t *zio, arc_buf_t *buf
 	mutex_enter(&db->db_mtx);
 	ASSERT(dr->dt.dl.dr_override_state == DR_IN_DMU_SYNC);
 	if (zio->io_error == 0) {
+		dr->dt.dl.dr_nopwrite = !!(zio->io_flags & ZIO_FLAG_NOPWRITE);
+		if (dr->dt.dl.dr_nopwrite) {
+			blkptr_t *bp = zio->io_bp;
+			blkptr_t *bp_orig = &zio->io_bp_orig;
+			uint8_t chksum = BP_GET_CHECKSUM(bp_orig);
+
+			ASSERT(BP_EQUAL(bp, bp_orig));
+			ASSERT(zio->io_prop.zp_compress != ZIO_COMPRESS_OFF);
+			ASSERT(zio_checksum_table[chksum].ci_dedup);
+		}
 		dr->dt.dl.dr_overridden_by = *zio->io_bp;
 		dr->dt.dl.dr_override_state = DR_OVERRIDDEN;
 		dr->dt.dl.dr_copies = zio->io_prop.zp_copies;
@@ -1308,11 +1328,22 @@ dmu_sync_late_arrival_done(zio_t *zio)
 {
 	blkptr_t *bp = zio->io_bp;
 	dmu_sync_arg_t *dsa = zio->io_private;
+	blkptr_t *bp_orig = &zio->io_bp_orig;
 
 	if (zio->io_error == 0 && !BP_IS_HOLE(bp)) {
-		ASSERT(zio->io_bp->blk_birth == zio->io_txg);
-		ASSERT(zio->io_txg > spa_syncing_txg(zio->io_spa));
-		zio_free(zio->io_spa, zio->io_txg, zio->io_bp);
+		/*
+		 * If we didn't allocate a new block (i.e. ZIO_FLAG_NOPWRITE)
+		 * then there is nothing to do here. Otherwise, free the
+		 * newly allocated block in this txg.
+		 */
+		if (zio->io_flags & ZIO_FLAG_NOPWRITE) {
+			ASSERT(BP_EQUAL(bp, bp_orig));
+		} else {
+			ASSERT(BP_IS_HOLE(bp_orig) || !BP_EQUAL(bp, bp_orig));
+			ASSERT(zio->io_bp->blk_birth == zio->io_txg);
+			ASSERT(zio->io_txg > spa_syncing_txg(zio->io_spa));
+			zio_free(zio->io_spa, zio->io_txg, zio->io_bp);
+		}
 	}
 
 	dmu_tx_commit(dsa->dsa_tx);
@@ -1357,7 +1388,7 @@ dmu_sync_late_arrival(zio_t *pio, objset
  *
  * Return values:
  *
- *	EEXIST: this txg has already been synced, so there's nothing to to.
+ *	EEXIST: this txg has already been synced, so there's nothing to do.
  *		The caller should not log the write.
  *
  *	ENOENT: the block was dbuf_free_range()'d, so there's nothing to do.
@@ -1389,7 +1420,6 @@ dmu_sync(zio_t *pio, uint64_t txg, dmu_s
 	dnode_t *dn;
 
 	ASSERT(pio != NULL);
-	ASSERT(BP_IS_HOLE(bp));
 	ASSERT(txg != 0);
 
 	SET_BOOKMARK(&zb, ds->ds_object,
@@ -1444,6 +1474,23 @@ dmu_sync(zio_t *pio, uint64_t txg, dmu_s
 		return (ENOENT);
 	}
 
+	ASSERT(dr->dr_next == NULL || dr->dr_next->dr_txg < txg);
+
+	/*
+	 * Assume the on-disk data is X, the current syncing data is Y,
+	 * and the current in-memory data is Z (currently in dmu_sync).
+	 * X and Z are identical but Y is has been modified. Normally,
+	 * when X and Z are the same we will perform a nopwrite but if Y
+	 * is different we must disable nopwrite since the resulting write
+	 * of Y to disk can free the block containing X. If we allowed a
+	 * nopwrite to occur the block pointing to Z would reference a freed
+	 * block. Since this is a rare case we simplify this by disabling
+	 * nopwrite if the current dmu_sync-ing dbuf has been modified in
+	 * a previous transaction.
+	 */
+	if (dr->dr_next)
+		zp.zp_nopwrite = B_FALSE;
+
 	ASSERT(dr->dr_txg == txg);
 	if (dr->dt.dl.dr_override_state == DR_IN_DMU_SYNC ||
 	    dr->dt.dl.dr_override_state == DR_OVERRIDDEN) {
@@ -1519,7 +1566,6 @@ dmu_object_set_compress(objset_t *os, ui
 
 int zfs_mdcomp_disable = 0;
 TUNABLE_INT("vfs.zfs.mdcomp_disable", &zfs_mdcomp_disable);
-SYSCTL_DECL(_vfs_zfs);
 SYSCTL_INT(_vfs_zfs, OID_AUTO, mdcomp_disable, CTLFLAG_RW,
     &zfs_mdcomp_disable, 0, "Disable metadata compression");
 
@@ -1532,15 +1578,27 @@ dmu_write_policy(objset_t *os, dnode_t *
 	enum zio_checksum checksum = os->os_checksum;
 	enum zio_compress compress = os->os_compress;
 	enum zio_checksum dedup_checksum = os->os_dedup_checksum;
-	boolean_t dedup;
+	boolean_t dedup = B_FALSE;
+	boolean_t nopwrite = B_FALSE;
 	boolean_t dedup_verify = os->os_dedup_verify;
 	int copies = os->os_copies;
 
 	/*
-	 * Determine checksum setting.
+	 * We maintain different write policies for each of the following
+	 * types of data:
+	 *	 1. metadata
+	 *	 2. preallocated blocks (i.e. level-0 blocks of a dump device)
+	 *	 3. all other level 0 blocks
 	 */
 	if (ismd) {
 		/*
+		 * XXX -- we should design a compression algorithm
+		 * that specializes in arrays of bps.
+		 */
+		compress = zfs_mdcomp_disable ? ZIO_COMPRESS_EMPTY :
+		    ZIO_COMPRESS_LZJB;
+
+		/*
 		 * Metadata always gets checksummed.  If the data
 		 * checksum is multi-bit correctable, and it's not a
 		 * ZBT-style checksum, then it's suitable for metadata
@@ -1550,45 +1608,47 @@ dmu_write_policy(objset_t *os, dnode_t *
 		if (zio_checksum_table[checksum].ci_correctable < 1 ||
 		    zio_checksum_table[checksum].ci_eck)
 			checksum = ZIO_CHECKSUM_FLETCHER_4;
-	} else {
-		checksum = zio_checksum_select(dn->dn_checksum, checksum);
-	}
+	} else if (wp & WP_NOFILL) {
+		ASSERT(level == 0);
 
-	/*
-	 * Determine compression setting.
-	 */
-	if (ismd) {
 		/*
-		 * XXX -- we should design a compression algorithm
-		 * that specializes in arrays of bps.
+		 * If we're writing preallocated blocks, we aren't actually
+		 * writing them so don't set any policy properties.  These
+		 * blocks are currently only used by an external subsystem
+		 * outside of zfs (i.e. dump) and not written by the zio
+		 * pipeline.
 		 */
-		compress = zfs_mdcomp_disable ? ZIO_COMPRESS_EMPTY :
-		    ZIO_COMPRESS_LZJB;
+		compress = ZIO_COMPRESS_OFF;
+		checksum = ZIO_CHECKSUM_OFF;
 	} else {
 		compress = zio_compress_select(dn->dn_compress, compress);
-	}
 
-	/*
-	 * Determine dedup setting.  If we are in dmu_sync(), we won't
-	 * actually dedup now because that's all done in syncing context;
-	 * but we do want to use the dedup checkum.  If the checksum is not
-	 * strong enough to ensure unique signatures, force dedup_verify.
-	 */
-	dedup = (!ismd && dedup_checksum != ZIO_CHECKSUM_OFF);
-	if (dedup) {
-		checksum = dedup_checksum;
-		if (!zio_checksum_table[checksum].ci_dedup)
-			dedup_verify = 1;
-	}
+		checksum = (dedup_checksum == ZIO_CHECKSUM_OFF) ?
+		    zio_checksum_select(dn->dn_checksum, checksum) :
+		    dedup_checksum;
 
-	if (wp & WP_DMU_SYNC)
-		dedup = 0;
+		/*
+		 * Determine dedup setting.  If we are in dmu_sync(),
+		 * we won't actually dedup now because that's all
+		 * done in syncing context; but we do want to use the
+		 * dedup checkum.  If the checksum is not strong
+		 * enough to ensure unique signatures, force
+		 * dedup_verify.
+		 */
+		if (dedup_checksum != ZIO_CHECKSUM_OFF) {
+			dedup = (wp & WP_DMU_SYNC) ? B_FALSE : B_TRUE;
+			if (!zio_checksum_table[checksum].ci_dedup)
+				dedup_verify = B_TRUE;
+		}
 
-	if (wp & WP_NOFILL) {
-		ASSERT(!ismd && level == 0);
-		checksum = ZIO_CHECKSUM_OFF;
-		compress = ZIO_COMPRESS_OFF;
-		dedup = B_FALSE;
+		/*
+		 * Enable nopwrite if we have a cryptographically secure
+		 * checksum that has no known collisions (i.e. SHA-256)
+		 * and compression is enabled.  We don't enable nopwrite if
+		 * dedup is enabled as the two features are mutually exclusive.
+		 */
+		nopwrite = (!dedup && zio_checksum_table[checksum].ci_dedup &&
+		    compress != ZIO_COMPRESS_OFF && zfs_nopwrite_enabled);
 	}
 
 	zp->zp_checksum = checksum;
@@ -1598,6 +1658,7 @@ dmu_write_policy(objset_t *os, dnode_t *
 	zp->zp_copies = MIN(copies + ismd, spa_max_replication(os->os_spa));
 	zp->zp_dedup = dedup;
 	zp->zp_dedup_verify = dedup && dedup_verify;
+	zp->zp_nopwrite = nopwrite;
 }
 
 int

Modified: stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c
==============================================================================
--- stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c	Mon Dec 10 14:36:48 2012	(r244087)
@@ -440,7 +440,6 @@ dsl_pool_sync(dsl_pool_t *dp, uint64_t t
 	 * clean up our in-memory structures accumulated while syncing:
 	 *
 	 *  - move dead blocks from the pending deadlist to the on-disk deadlist
-	 *  - clean up zil records
 	 *  - release hold from dsl_dataset_dirty()
 	 */
 	while (ds = list_remove_head(&synced_datasets)) {

Modified: stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c
==============================================================================
--- stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c	Mon Dec 10 14:36:48 2012	(r244087)
@@ -881,8 +881,9 @@ metaslab_activate(metaslab_t *msp, uint6
 	if ((msp->ms_weight & METASLAB_ACTIVE_MASK) == 0) {
 		space_map_load_wait(sm);
 		if (!sm->sm_loaded) {
-			int error = space_map_load(sm, sm_ops, SM_FREE,
-			    &msp->ms_smo,
+			space_map_obj_t *smo = &msp->ms_smo;
+
+			int error = space_map_load(sm, sm_ops, SM_FREE, smo,
 			    spa_meta_objset(msp->ms_group->mg_vd->vdev_spa));
 			if (error)  {
 				metaslab_group_sort(msp->ms_group, msp, 0);

Modified: stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c
==============================================================================
--- stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c	Mon Dec 10 14:36:48 2012	(r244087)
@@ -138,6 +138,7 @@ boolean_t	zio_taskq_sysdc = B_TRUE;	/* u
 uint_t		zio_taskq_basedc = 80;		/* base duty cycle */
 
 boolean_t	spa_create_process = B_TRUE;	/* no process ==> no sysdc */
+extern int	zfs_sync_pass_deferred_free;
 
 /*
  * This (illegal) pool name is used when temporarily importing a spa_t in order
@@ -6227,7 +6228,7 @@ spa_sync(spa_t *spa, uint64_t txg)
 		spa_errlog_sync(spa, txg);
 		dsl_pool_sync(dp, txg);
 
-		if (pass <= SYNC_PASS_DEFERRED_FREE) {
+		if (pass < zfs_sync_pass_deferred_free) {
 			zio_t *zio = zio_root(spa, NULL, NULL, 0);
 			bplist_iterate(free_bpl, spa_free_sync_cb,
 			    zio, tx);

Modified: stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c
==============================================================================
--- stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c	Mon Dec 10 14:36:48 2012	(r244087)
@@ -1619,6 +1619,7 @@ spa_init(int mode)
 #endif /* illumos */
 	refcount_sysinit();
 	unique_init();
+	space_map_init();
 	zio_init();
 	dmu_init();
 	zil_init();
@@ -1641,6 +1642,7 @@ spa_fini(void)
 	zil_fini();
 	dmu_fini();
 	zio_fini();
+	space_map_fini();
 	unique_fini();
 	refcount_fini();
 

Modified: stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c
==============================================================================
--- stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c	Mon Dec 10 14:36:48 2012	(r244087)
@@ -32,6 +32,23 @@
 #include <sys/zio.h>
 #include <sys/space_map.h>
 
+static kmem_cache_t *space_seg_cache;
+
+void
+space_map_init(void)
+{
+	ASSERT(space_seg_cache == NULL);
+	space_seg_cache = kmem_cache_create("space_seg_cache",
+	    sizeof (space_seg_t), 0, NULL, NULL, NULL, NULL, NULL, 0);
+}
+
+void
+space_map_fini(void)
+{
+	kmem_cache_destroy(space_seg_cache);
+	space_seg_cache = NULL;
+}
+
 /*
  * Space map routines.
  * NOTE: caller is responsible for all locking.
@@ -124,7 +141,7 @@ space_map_add(space_map_t *sm, uint64_t 
 			avl_remove(sm->sm_pp_root, ss_after);
 		}
 		ss_after->ss_start = ss_before->ss_start;
-		kmem_free(ss_before, sizeof (*ss_before));
+		kmem_cache_free(space_seg_cache, ss_before);
 		ss = ss_after;
 	} else if (merge_before) {
 		ss_before->ss_end = end;
@@ -137,7 +154,7 @@ space_map_add(space_map_t *sm, uint64_t 
 			avl_remove(sm->sm_pp_root, ss_after);
 		ss = ss_after;
 	} else {
-		ss = kmem_alloc(sizeof (*ss), KM_SLEEP);
+		ss = kmem_cache_alloc(space_seg_cache, KM_SLEEP);
 		ss->ss_start = start;
 		ss->ss_end = end;
 		avl_insert(&sm->sm_root, ss, where);
@@ -183,7 +200,7 @@ space_map_remove(space_map_t *sm, uint64
 		avl_remove(sm->sm_pp_root, ss);
 
 	if (left_over && right_over) {
-		newseg = kmem_alloc(sizeof (*newseg), KM_SLEEP);
+		newseg = kmem_cache_alloc(space_seg_cache, KM_SLEEP);
 		newseg->ss_start = end;
 		newseg->ss_end = ss->ss_end;
 		ss->ss_end = start;
@@ -196,7 +213,7 @@ space_map_remove(space_map_t *sm, uint64
 		ss->ss_start = end;
 	} else {
 		avl_remove(&sm->sm_root, ss);
-		kmem_free(ss, sizeof (*ss));
+		kmem_cache_free(space_seg_cache, ss);
 		ss = NULL;
 	}
 
@@ -236,7 +253,7 @@ space_map_vacate(space_map_t *sm, space_
 	while ((ss = avl_destroy_nodes(&sm->sm_root, &cookie)) != NULL) {
 		if (func != NULL)
 			func(mdest, ss->ss_start, ss->ss_end - ss->ss_start);
-		kmem_free(ss, sizeof (*ss));
+		kmem_cache_free(space_seg_cache, ss);
 	}
 	sm->sm_space = 0;
 }
@@ -408,7 +425,7 @@ space_map_sync(space_map_t *sm, uint8_t 
 	spa_t *spa = dmu_objset_spa(os);
 	void *cookie = NULL;
 	space_seg_t *ss;
-	uint64_t bufsize, start, size, run_len;
+	uint64_t bufsize, start, size, run_len, delta, sm_space;
 	uint64_t *entry, *entry_map, *entry_map_end;
 
 	ASSERT(MUTEX_HELD(sm->sm_lock));
@@ -437,11 +454,13 @@ space_map_sync(space_map_t *sm, uint8_t 
 	    SM_DEBUG_SYNCPASS_ENCODE(spa_sync_pass(spa)) |
 	    SM_DEBUG_TXG_ENCODE(dmu_tx_get_txg(tx));
 
+	delta = 0;
+	sm_space = sm->sm_space;
 	while ((ss = avl_destroy_nodes(&sm->sm_root, &cookie)) != NULL) {
 		size = ss->ss_end - ss->ss_start;
 		start = (ss->ss_start - sm->sm_start) >> sm->sm_shift;
 
-		sm->sm_space -= size;
+		delta += size;
 		size >>= sm->sm_shift;
 
 		while (size) {
@@ -463,7 +482,7 @@ space_map_sync(space_map_t *sm, uint8_t 
 			start += run_len;
 			size -= run_len;
 		}
-		kmem_free(ss, sizeof (*ss));
+		kmem_cache_free(space_seg_cache, ss);
 	}
 
 	if (entry != entry_map) {
@@ -475,8 +494,15 @@ space_map_sync(space_map_t *sm, uint8_t 
 		smo->smo_objsize += size;
 	}
 
+	/*
+	 * Ensure that the space_map's accounting wasn't changed
+	 * while we were in the middle of writing it out.
+	 */
+	VERIFY3U(sm->sm_space, ==, sm_space);
+
 	zio_buf_free(entry_map, bufsize);
 
+	sm->sm_space -= delta;
 	VERIFY0(sm->sm_space);
 }
 

Modified: stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h
==============================================================================
--- stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dbuf.h	Mon Dec 10 14:36:48 2012	(r244087)
@@ -20,6 +20,7 @@
  */
 /*
  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
+ * Copyright (c) 2012 by Delphix. All rights reserved.
  */
 
 #ifndef	_SYS_DBUF_H
@@ -130,6 +131,7 @@ typedef struct dbuf_dirty_record {
 			blkptr_t dr_overridden_by;
 			override_states_t dr_override_state;
 			uint8_t dr_copies;
+			boolean_t dr_nopwrite;
 		} dl;
 	} dt;
 } dbuf_dirty_record_t;

Modified: stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h
==============================================================================
--- stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h	Mon Dec 10 14:36:48 2012	(r244087)
@@ -505,6 +505,11 @@ void dmu_evict_user(objset_t *os, dmu_bu
 void *dmu_buf_get_user(dmu_buf_t *db);
 
 /*
+ * Returns the blkptr associated with this dbuf, or NULL if not set.
+ */
+struct blkptr *dmu_buf_get_blkptr(dmu_buf_t *db);
+
+/*
  * Indicate that you are going to modify the buffer's data (db_data).
  *
  * The transaction (tx) must be assigned to a txg (ie. you've called

Modified: stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab_impl.h
==============================================================================
--- stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab_impl.h	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab_impl.h	Mon Dec 10 14:36:48 2012	(r244087)
@@ -21,7 +21,10 @@
 /*
  * Copyright 2009 Sun Microsystems, Inc.  All rights reserved.
  * Use is subject to license terms.
- * Copyright (c) 2011 by Delphix. All rights reserved.
+ */
+
+/*
+ * Copyright (c) 2012 by Delphix. All rights reserved.
  */
 
 #ifndef _SYS_METASLAB_IMPL_H

Modified: stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h
==============================================================================
--- stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa.h	Mon Dec 10 14:36:48 2012	(r244087)
@@ -489,14 +489,6 @@ extern int spa_scan_stop(spa_t *spa);
 extern void spa_sync(spa_t *spa, uint64_t txg); /* only for DMU use */
 extern void spa_sync_allpools(void);
 
-/*
- * DEFERRED_FREE must be large enough that regular blocks are not
- * deferred.  XXX so can't we change it back to 1?
- */
-#define	SYNC_PASS_DEFERRED_FREE	2	/* defer frees after this pass */
-#define	SYNC_PASS_DONT_COMPRESS	4	/* don't compress after this pass */
-#define	SYNC_PASS_REWRITE	1	/* rewrite new bps after this pass */
-
 /* spa namespace global mutex */
 extern kmutex_t spa_namespace_lock;
 

Modified: stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/space_map.h
==============================================================================
--- stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/space_map.h	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/space_map.h	Mon Dec 10 14:36:48 2012	(r244087)
@@ -23,6 +23,10 @@
  * Use is subject to license terms.
  */
 
+/*
+ * Copyright (c) 2012 by Delphix. All rights reserved.
+ */
+
 #ifndef _SYS_SPACE_MAP_H
 #define	_SYS_SPACE_MAP_H
 
@@ -136,6 +140,8 @@ struct space_map_ops {
 
 typedef void space_map_func_t(space_map_t *sm, uint64_t start, uint64_t size);
 
+extern void space_map_init(void);
+extern void space_map_fini(void);
 extern void space_map_create(space_map_t *sm, uint64_t start, uint64_t size,
     uint8_t shift, kmutex_t *lp);
 extern void space_map_destroy(space_map_t *sm);

Modified: stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
==============================================================================
--- stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h	Mon Dec 10 14:36:48 2012	(r244087)
@@ -186,7 +186,9 @@ enum zio_flag {
 	ZIO_FLAG_RAW		= 1 << 21,
 	ZIO_FLAG_GANG_CHILD	= 1 << 22,
 	ZIO_FLAG_DDT_CHILD	= 1 << 23,
-	ZIO_FLAG_GODFATHER	= 1 << 24
+	ZIO_FLAG_GODFATHER	= 1 << 24,
+	ZIO_FLAG_NOPWRITE	= 1 << 25,
+	ZIO_FLAG_REEXECUTED	= 1 << 26,
 };
 
 #define	ZIO_FLAG_MUSTSUCCEED		0
@@ -285,8 +287,9 @@ typedef struct zio_prop {
 	dmu_object_type_t	zp_type;
 	uint8_t			zp_level;
 	uint8_t			zp_copies;
-	uint8_t			zp_dedup;
-	uint8_t			zp_dedup_verify;
+	boolean_t		zp_dedup;
+	boolean_t		zp_dedup_verify;
+	boolean_t		zp_nopwrite;
 } zio_prop_t;
 
 typedef struct zio_cksum_report zio_cksum_report_t;
@@ -454,7 +457,8 @@ extern zio_t *zio_rewrite(zio_t *pio, sp
     void *data, uint64_t size, zio_done_func_t *done, void *priv,
     int priority, enum zio_flag flags, zbookmark_t *zb);
 
-extern void zio_write_override(zio_t *zio, blkptr_t *bp, int copies);
+extern void zio_write_override(zio_t *zio, blkptr_t *bp, int copies,
+    boolean_t nopwrite);
 
 extern void zio_free(spa_t *spa, uint64_t txg, const blkptr_t *bp);
 

Modified: stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_impl.h
==============================================================================
--- stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_impl.h	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_impl.h	Mon Dec 10 14:36:48 2012	(r244087)
@@ -23,6 +23,10 @@
  * Use is subject to license terms.
  */
 
+/*
+ * Copyright (c) 2012 by Delphix. All rights reserved.
+ */
+
 #ifndef _ZIO_IMPL_H
 #define	_ZIO_IMPL_H
 
@@ -34,6 +38,70 @@ extern "C" {
 #endif
 
 /*
+ * XXX -- Describe ZFS I/O pipleine here. Fill in as needed.
+ *
+ * The ZFS I/O pipeline is comprised of various stages which are defined
+ * in the zio_stage enum below. The individual stages are used to construct
+ * these basic I/O operations: Read, Write, Free, Claim, and Ioctl.
+ *
+ * I/O operations: (XXX - provide detail for each of the operations)
+ *
+ * Read:
+ * Write:
+ * Free:
+ * Claim:
+ * Ioctl:
+ *
+ * Although the most common pipeline are used by the basic I/O operations
+ * above, there are some helper pipelines (one could consider them
+ * sub-pipelines) which are used internally by the ZIO module and are
+ * explained below:
+ *
+ * Interlock Pipeline:
+ * The interlock pipeline is the most basic pipeline and is used by all
+ * of the I/O operations. The interlock pipeline does not perform any I/O
+ * and is used to coordinate the dependencies between I/Os that are being
+ * issued (i.e. the parent/child relationship).
+ *
+ * Vdev child Pipeline:
+ * The vdev child pipeline is responsible for performing the physical I/O.
+ * It is in this pipeline where the I/O are queued and possibly cached.
+ *
+ * In addition to performing I/O, the pipeline is also responsible for
+ * data transformations. The transformations performed are based on the
+ * specific properties that user may have selected and modify the
+ * behavior of the pipeline. Examples of supported transformations are
+ * compression, dedup, and nop writes. Transformations will either modify
+ * the data or the pipeline. This list below further describes each of
+ * the supported transformations:
+ *
+ * Compression:
+ * ZFS supports three different flavors of compression -- gzip, lzjb, and
+ * zle. Compression occurs as part of the write pipeline and is performed
+ * in the ZIO_STAGE_WRITE_BP_INIT stage.
+ *
+ * Dedup:
+ * Dedup reads are handled by the ZIO_STAGE_DDT_READ_START and
+ * ZIO_STAGE_DDT_READ_DONE stages. These stages are added to an existing
+ * read pipeline if the dedup bit is set on the block pointer.
+ * Writing a dedup block is performed by the ZIO_STAGE_DDT_WRITE stage
+ * and added to a write pipeline if a user has enabled dedup on that
+ * particular dataset.
+ *
+ * NOP Write:
+ * The NOP write feature is performed by the ZIO_STAGE_NOP_WRITE stage
+ * and is added to an existing write pipeline if a crypographically
+ * secure checksum (i.e. SHA256) is enabled and compression is turned on.
+ * The NOP write stage will compare the checksums of the current data
+ * on-disk (level-0 blocks only) and the data that is currently being written.
+ * If the checksum values are identical then the pipeline is converted to
+ * an interlock pipeline skipping block allocation and bypassing the
+ * physical I/O.  The nop write feature can handle writes in either
+ * syncing or open context (i.e. zil writes) and as a result is mutually
+ * exclusive with dedup.
+ */
+
+/*
  * zio pipeline stage definitions
  */
 enum zio_stage {
@@ -46,27 +114,29 @@ enum zio_stage {
 
 	ZIO_STAGE_CHECKSUM_GENERATE	= 1 << 5,	/* -W--- */
 
-	ZIO_STAGE_DDT_READ_START	= 1 << 6,	/* R---- */
-	ZIO_STAGE_DDT_READ_DONE		= 1 << 7,	/* R---- */
-	ZIO_STAGE_DDT_WRITE		= 1 << 8,	/* -W--- */
-	ZIO_STAGE_DDT_FREE		= 1 << 9,	/* --F-- */
+	ZIO_STAGE_NOP_WRITE		= 1 << 6,	/* -W--- */
 
-	ZIO_STAGE_GANG_ASSEMBLE		= 1 << 10,	/* RWFC- */
-	ZIO_STAGE_GANG_ISSUE		= 1 << 11,	/* RWFC- */
+	ZIO_STAGE_DDT_READ_START	= 1 << 7,	/* R---- */
+	ZIO_STAGE_DDT_READ_DONE		= 1 << 8,	/* R---- */
+	ZIO_STAGE_DDT_WRITE		= 1 << 9,	/* -W--- */
+	ZIO_STAGE_DDT_FREE		= 1 << 10,	/* --F-- */
 
-	ZIO_STAGE_DVA_ALLOCATE		= 1 << 12,	/* -W--- */
-	ZIO_STAGE_DVA_FREE		= 1 << 13,	/* --F-- */
-	ZIO_STAGE_DVA_CLAIM		= 1 << 14,	/* ---C- */
+	ZIO_STAGE_GANG_ASSEMBLE		= 1 << 11,	/* RWFC- */
+	ZIO_STAGE_GANG_ISSUE		= 1 << 12,	/* RWFC- */
 
-	ZIO_STAGE_READY			= 1 << 15,	/* RWFCI */
+	ZIO_STAGE_DVA_ALLOCATE		= 1 << 13,	/* -W--- */
+	ZIO_STAGE_DVA_FREE		= 1 << 14,	/* --F-- */
+	ZIO_STAGE_DVA_CLAIM		= 1 << 15,	/* ---C- */
 
-	ZIO_STAGE_VDEV_IO_START		= 1 << 16,	/* RW--I */
-	ZIO_STAGE_VDEV_IO_DONE		= 1 << 17,	/* RW--- */
-	ZIO_STAGE_VDEV_IO_ASSESS	= 1 << 18,	/* RW--I */
+	ZIO_STAGE_READY			= 1 << 16,	/* RWFCI */
 
-	ZIO_STAGE_CHECKSUM_VERIFY	= 1 << 19,	/* R---- */
+	ZIO_STAGE_VDEV_IO_START		= 1 << 17,	/* RW--I */
+	ZIO_STAGE_VDEV_IO_DONE		= 1 << 18,	/* RW--- */
+	ZIO_STAGE_VDEV_IO_ASSESS	= 1 << 19,	/* RW--I */
 
-	ZIO_STAGE_DONE			= 1 << 20	/* RWFCI */
+	ZIO_STAGE_CHECKSUM_VERIFY	= 1 << 20,	/* R---- */
+
+	ZIO_STAGE_DONE			= 1 << 21	/* RWFCI */
 };
 
 #define	ZIO_INTERLOCK_STAGES			\
@@ -143,6 +213,7 @@ enum zio_stage {
 #define	ZIO_FREE_PIPELINE			\
 	(ZIO_INTERLOCK_STAGES |			\
 	ZIO_STAGE_FREE_BP_INIT |		\
+	ZIO_STAGE_ISSUE_ASYNC |			\
 	ZIO_STAGE_DVA_FREE)
 
 #define	ZIO_DDT_FREE_PIPELINE			\

Modified: stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
==============================================================================
--- stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c	Mon Dec 10 14:36:48 2012	(r244087)
@@ -1206,6 +1206,12 @@ zfs_get_data(void *arg, lr_write_t *lr, 
 			    DMU_READ_NO_PREFETCH);
 
 		if (error == 0) {
+			blkptr_t *obp = dmu_buf_get_blkptr(db);
+			if (obp) {
+				ASSERT(BP_IS_HOLE(bp));
+				*bp = *obp;
+			}
+
 			zgd->zgd_db = db;
 			zgd->zgd_bp = bp;
 
@@ -3254,6 +3260,12 @@ top:
 		uint64_t acl_obj;
 		new_mode = (pmode & S_IFMT) | (vap->va_mode & ~S_IFMT);
 
+		if (zp->z_zfsvfs->z_acl_mode == ZFS_ACL_RESTRICTED &&
+		    !(zp->z_pflags & ZFS_ACL_TRIVIAL)) {
+			err = EPERM;
+			goto out;
+		}
+
 		if (err = zfs_acl_chmod_setattr(zp, &aclp, new_mode))
 			goto out;
 

Modified: stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
==============================================================================
--- stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c	Mon Dec 10 14:13:44 2012	(r244086)
+++ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c	Mon Dec 10 14:36:48 2012	(r244087)
@@ -89,6 +89,31 @@ extern vmem_t *zio_alloc_arena;
 extern int zfs_mg_alloc_failures;
 
 /*
+ * The following actions directly effect the spa's sync-to-convergence logic.
+ * The values below define the sync pass when we start performing the action.
+ * Care should be taken when changing these values as they directly impact
+ * spa_sync() performance. Tuning these values may introduce subtle performance
+ * pathologies and should only be done in the context of performance analysis.
+ * These tunables will eventually be removed and replaced with #defines once
+ * enough analysis has been done to determine optimal values.
+ *
+ * The 'zfs_sync_pass_deferred_free' pass must be greater than 1 to ensure that
+ * regular blocks are not deferred.
+ */
+int zfs_sync_pass_deferred_free = 2; /* defer frees starting in this pass */
+TUNABLE_INT("vfs.zfs.sync_pass_deferred_free", &zfs_sync_pass_deferred_free);
+SYSCTL_INT(_vfs_zfs, OID_AUTO, sync_pass_deferred_free, CTLFLAG_RDTUN,
+    &zfs_sync_pass_deferred_free, 0, "defer frees starting in this pass");
+int zfs_sync_pass_dont_compress = 5; /* don't compress starting in this pass */
+TUNABLE_INT("vfs.zfs.sync_pass_dont_compress", &zfs_sync_pass_dont_compress);
+SYSCTL_INT(_vfs_zfs, OID_AUTO, sync_pass_dont_compress, CTLFLAG_RDTUN,
+    &zfs_sync_pass_dont_compress, 0, "don't compress starting in this pass");
+int zfs_sync_pass_rewrite = 2; /* rewrite new bps starting in this pass */
+TUNABLE_INT("vfs.zfs.sync_pass_rewrite", &zfs_sync_pass_rewrite);
+SYSCTL_INT(_vfs_zfs, OID_AUTO, sync_pass_rewrite, CTLFLAG_RDTUN,
+    &zfs_sync_pass_rewrite, 0, "rewrite new bps starting in this pass");
+
+/*
  * An allocating zio is one that either currently has the DVA allocate
  * stage set or will have it later in its lifetime.
  */
@@ -650,9 +675,7 @@ zio_write(zio_t *pio, spa_t *spa, uint64

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***


More information about the svn-src-stable-9 mailing list