svn commit: r354954 - vendor-sys/illumos/dist/uts/common/fs/zfs vendor-sys/illumos/dist/uts/common/fs/zfs/sys vendor/illumos/dist/cmd/zinject

Andriy Gapon avg at FreeBSD.org
Thu Nov 21 14:00:13 UTC 2019


Author: avg
Date: Thu Nov 21 14:00:09 2019
New Revision: 354954
URL: https://svnweb.freebsd.org/changeset/base/354954

Log:
  10566 Multiple DVA Scrubbing Fix
  
  illumos/illumos-gate at 12a8814c13fbb1d6d58616cf090ea5815dc107f9
  https://github.com/illumos/illumos-gate/commit/12a8814c13fbb1d6d58616cf090ea5815dc107f9
  
  https://www.illumos.org/issues/10566
    ZoL PR_8453
    Author: Tom Caputi <tcaputi at datto.com>
    Date:   Fri Mar 15 17:14:31 2019 -0400
  
        Multiple DVA Scrubbing Fix
  
        Currently, there is an issue in the sequential scrub code which
        prevents self healing from working in some cases. The scrub code
        will split up all DVA copies of a bp and issue each of them
        separately. The problem is that, since each of the DVAs is no
        longer associated with the others, the self healing code doesn't
        have the opportunity to repair problems that show up in one of the
        DVAs with the data from the others.
  
        This patch fixes this issue by ensuring that all IOs issued by the
        sequential scrub code include all DVAs. Initially, only the first
        DVA of each is attempted. If an issue arises, the IO is retried
        with all available copies, giving the self healing code a chance
        to correct the issue.
  
        To test this change, this patch also adds the ability for zinject
        to specify individual DVAs to inject read errors into. We then
        add a new test case that utilizes this functionality to ensure
        scrubs and self-healing reads can handle and transparently fix
        issues with individual copies of blocks.
  
    This update is followup on #10405
    While attempting to port this update, the following ZoL updates are included:
  
        551905dd4 vdev_mirror: kstat observables for preferred vdev
        d6c6590c5 vdev_mirror: load balancing fixes
        9f500936c FreeBSD r256956: Improve ZFS N-way mirror read performance by
  
  Portions contributed by: Toomas Soome <tsoome at me.com>
  Portions contributed by: Jerry Jelinek <jerry.jelinek at joyent.com>
  Author: Tom Caputi <tcaputi at datto.com>

Modified:
  vendor/illumos/dist/cmd/zinject/zinject.c

Changes in other areas also in this revision:
Modified:
  vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_scan.c
  vendor-sys/illumos/dist/uts/common/fs/zfs/metaslab.c
  vendor-sys/illumos/dist/uts/common/fs/zfs/spa_misc.c
  vendor-sys/illumos/dist/uts/common/fs/zfs/sys/spa.h
  vendor-sys/illumos/dist/uts/common/fs/zfs/sys/vdev.h
  vendor-sys/illumos/dist/uts/common/fs/zfs/sys/vdev_impl.h
  vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zfs_ioctl.h
  vendor-sys/illumos/dist/uts/common/fs/zfs/vdev.c
  vendor-sys/illumos/dist/uts/common/fs/zfs/vdev_disk.c
  vendor-sys/illumos/dist/uts/common/fs/zfs/vdev_file.c
  vendor-sys/illumos/dist/uts/common/fs/zfs/vdev_mirror.c
  vendor-sys/illumos/dist/uts/common/fs/zfs/vdev_queue.c
  vendor-sys/illumos/dist/uts/common/fs/zfs/zio_inject.c

Modified: vendor/illumos/dist/cmd/zinject/zinject.c
==============================================================================
--- vendor/illumos/dist/cmd/zinject/zinject.c	Thu Nov 21 13:59:06 2019	(r354953)
+++ vendor/illumos/dist/cmd/zinject/zinject.c	Thu Nov 21 14:00:09 2019	(r354954)
@@ -47,48 +47,48 @@
  *
  * This form of the command looks like:
  *
- * 	zinject -d device [-e errno] [-L <uber | nvlist | pad1 | pad2>] pool
+ *	zinject -d device [-e errno] [-L <uber | nvlist | pad1 | pad2>] pool
  *
  *
  * DATA FAULTS
  *
  * We begin with a tuple of the form:
  *
- * 	<type,level,range,object>
+ *	<type,level,range,object>
  *
- * 	type	A string describing the type of data to target.  Each type
- * 		implicitly describes how to interpret 'object'. Currently,
- * 		the following values are supported:
+ *	type	A string describing the type of data to target.  Each type
+ *		implicitly describes how to interpret 'object'. Currently,
+ *		the following values are supported:
  *
- * 		data		User data for a file
- * 		dnode		Dnode for a file or directory
+ *		data		User data for a file
+ *		dnode		Dnode for a file or directory
  *
  *		The following MOS objects are special.  Instead of injecting
  *		errors on a particular object or blkid, we inject errors across
  *		all objects of the given type.
  *
- * 		mos		Any data in the MOS
- * 		mosdir		object directory
- * 		config		pool configuration
- * 		bpobj		blkptr list
- * 		spacemap	spacemap
- * 		metaslab	metaslab
- * 		errlog		persistent error log
+ *		mos		Any data in the MOS
+ *		mosdir		object directory
+ *		config		pool configuration
+ *		bpobj		blkptr list
+ *		spacemap	spacemap
+ *		metaslab	metaslab
+ *		errlog		persistent error log
  *
- * 	level	Object level.  Defaults to '0', not applicable to all types.  If
- * 		a range is given, this corresponds to the indirect block
- * 		corresponding to the specific range.
+ *	level	Object level.  Defaults to '0', not applicable to all types.  If
+ *		a range is given, this corresponds to the indirect block
+ *		corresponding to the specific range.
  *
  *	range	A numerical range [start,end) within the object.  Defaults to
  *		the full size of the file.
  *
- * 	object	A string describing the logical location of the object.  For
- * 		files and directories (currently the only supported types),
- * 		this is the path of the object on disk.
+ *	object	A string describing the logical location of the object.  For
+ *		files and directories (currently the only supported types),
+ *		this is the path of the object on disk.
  *
  * This is translated, via libzpool, into the following internal representation:
  *
- * 	<type,objset,object,level,range>
+ *	<type,objset,object,level,range>
  *
  * These types should be self-explanatory.  This tuple is then passed to the
  * kernel via a special ioctl() to initiate fault injection for the given
@@ -98,12 +98,12 @@
  *
  * The command itself takes one of the forms:
  *
- * 	zinject
- * 	zinject <-a | -u pool>
- * 	zinject -c <id|all>
- * 	zinject [-q] <-t type> [-f freq] [-u] [-a] [-m] [-e errno] [-l level]
+ *	zinject
+ *	zinject <-a | -u pool>
+ *	zinject -c <id|all>
+ *	zinject [-q] <-t type> [-f freq] [-u] [-a] [-m] [-e errno] [-l level]
  *	    [-r range] <object>
- * 	zinject [-f freq] [-a] [-m] [-u] -b objset:object:level:start:end pool
+ *	zinject [-f freq] [-a] [-m] [-u] -b objset:object:level:start:end pool
  *
  * With no arguments, the command prints all currently registered injection
  * handlers, with their numeric identifiers.
@@ -288,8 +288,8 @@ usage(void)
 	    "\t\tspecified by the remaining tuple.  Each number is in\n"
 	    "\t\thexidecimal, and only one block can be specified.\n"
 	    "\n"
-	    "\tzinject [-q] <-t type> [-e errno] [-l level] [-r range]\n"
-	    "\t    [-a] [-m] [-u] [-f freq] <object>\n"
+	    "\tzinject [-q] <-t type> [-C dvas] [-e errno] [-l level]\n"
+	    "\t\t[-r range] [-a] [-m] [-u] [-f freq] <object>\n"
 	    "\n"
 	    "\t\tInject an error into the object specified by the '-t' option\n"
 	    "\t\tand the object descriptor.  The 'object' parameter is\n"
@@ -297,7 +297,10 @@ usage(void)
 	    "\n"
 	    "\t\t-q\tQuiet mode.  Only print out the handler number added.\n"
 	    "\t\t-e\tInject a specific error.  Must be either 'io' or\n"
-	    "\t\t\t'checksum'.  Default is 'io'.\n"
+	    "\t\t\t'checksum', or 'decompress'.  Default is 'io'.\n"
+	    "\t\t-C\tInject the given error only into specific DVAs. The\n"
+	    "\t\t\tDVAs should be specified as a list of 0-indexed DVAs\n"
+	    "\t\t\tseparated by commas (ex. '0,2').\n"
 	    "\t\t-l\tInject error at a particular block level. Default is "
 	    "0.\n"
 	    "\t\t-m\tAutomatically remount underlying filesystem.\n"
@@ -358,17 +361,19 @@ print_data_handler(int id, const char *pool, zinject_r
 		return (0);
 
 	if (*count == 0) {
-		(void) printf("%3s  %-15s  %-6s  %-6s  %-8s  %3s  %-15s\n",
-		    "ID", "POOL", "OBJSET", "OBJECT", "TYPE", "LVL",  "RANGE");
+		(void) printf("%3s  %-15s  %-6s  %-6s  %-8s  %3s  %-4s  ",
+		    "%-15s\n", "ID", "POOL", "OBJSET", "OBJECT", "TYPE",
+		    "LVL", "DVAs", "RANGE");
 		(void) printf("---  ---------------  ------  "
-		    "------  --------  ---  ---------------\n");
+		    "------  --------  ---  ---- ----------------\n");
 	}
 
 	*count += 1;
 
-	(void) printf("%3d  %-15s  %-6llu  %-6llu  %-8s  %3d  ", id, pool,
-	    (u_longlong_t)record->zi_objset, (u_longlong_t)record->zi_object,
-	    type_to_name(record->zi_type), record->zi_level);
+	(void) printf("%3d  %-15s  %-6llu  %-6llu  %-8s  %-3d  0x%02x  ",
+	    id, pool, (u_longlong_t)record->zi_objset,
+	    (u_longlong_t)record->zi_object, type_to_name(record->zi_type),
+	    record->zi_level, record->zi_dvas);
 
 	if (record->zi_start == 0 &&
 	    record->zi_end == -1ULL)
@@ -598,6 +603,7 @@ register_handler(const char *pool, int flags, zinject_
 				(void) printf(" range: [%llu, %llu)\n",
 				    (u_longlong_t)record->zi_start,
 				    (u_longlong_t)record->zi_end);
+			(void) printf("  dvas: 0x%x\n", record->zi_dvas);
 		}
 	}
 
@@ -649,6 +655,59 @@ parse_delay(char *str, uint64_t *delay, uint64_t *nlan
 	return (0);
 }
 
+/*
+ * This function converts a string specifier for DVAs into a bit mask.
+ * The dva's provided by the user should be 0 indexed and separated by
+ * a comma. For example:
+ *     "1"     -> 0b0010  (0x2)
+ *     "0,1"   -> 0b0011  (0x3)
+ *     "0,1,2" -> 0b0111  (0x7)
+ */
+static int
+parse_dvas(const char *str, uint32_t *dvas_out)
+{
+	const char *c = str;
+	uint32_t mask = 0;
+	boolean_t need_delim = B_FALSE;
+
+	/* max string length is 5 ("0,1,2") */
+	if (strlen(str) > 5 || strlen(str) == 0)
+		return (EINVAL);
+
+	while (*c != '\0') {
+		switch (*c) {
+		case '0':
+		case '1':
+		case '2':
+			/* check for pipe between DVAs */
+			if (need_delim)
+				return (EINVAL);
+
+			/* check if this DVA has been set already */
+			if (mask & (1 << ((*c) - '0')))
+				return (EINVAL);
+
+			mask |= (1 << ((*c) - '0'));
+			need_delim = B_TRUE;
+			break;
+		case ',':
+			need_delim = B_FALSE;
+			break;
+		default:
+			/* check for invalid character */
+			return (EINVAL);
+		}
+		c++;
+	}
+
+	/* check for dangling delimiter */
+	if (!need_delim)
+		return (EINVAL);
+
+	*dvas_out = mask;
+	return (0);
+}
+
 int
 main(int argc, char **argv)
 {
@@ -675,6 +734,7 @@ main(int argc, char **argv)
 	int dur_secs = 0;
 	int ret;
 	int flags = 0;
+	uint32_t dvas = 0;
 
 	if ((g_zfs = libzfs_init()) == NULL) {
 		(void) fprintf(stderr, "internal error: failed to "
@@ -705,7 +765,7 @@ main(int argc, char **argv)
 	}
 
 	while ((c = getopt(argc, argv,
-	    ":aA:b:d:D:f:Fg:qhIc:t:T:l:mr:s:e:uL:p:")) != -1) {
+	    ":aA:b:C:d:D:f:Fg:qhIc:t:T:l:mr:s:e:uL:p:")) != -1) {
 		switch (c) {
 		case 'a':
 			flags |= ZINJECT_FLUSH_ARC;
@@ -728,6 +788,17 @@ main(int argc, char **argv)
 		case 'c':
 			cancel = optarg;
 			break;
+		case 'C':
+			ret = parse_dvas(optarg, &dvas);
+			if (ret != 0) {
+				(void) fprintf(stderr, "invalid DVA list '%s': "
+				    "DVAs should be 0 indexed and separated by "
+				    "commas.\n", optarg);
+				usage();
+				libzfs_fini(g_zfs);
+				return (1);
+			}
+			break;
 		case 'd':
 			device = optarg;
 			break;
@@ -887,7 +958,8 @@ main(int argc, char **argv)
 		 * '-c' is invalid with any other options.
 		 */
 		if (raw != NULL || range != NULL || type != TYPE_INVAL ||
-		    level != 0 || record.zi_cmd != ZINJECT_UNINITIALIZED) {
+		    level != 0 || record.zi_cmd != ZINJECT_UNINITIALIZED ||
+		    record.zi_freq > 0 || dvas != 0) {
 			(void) fprintf(stderr, "cancel (-c) incompatible with "
 			    "any other options\n");
 			usage();
@@ -919,7 +991,8 @@ main(int argc, char **argv)
 		 * for doing injection, so handle it separately here.
 		 */
 		if (raw != NULL || range != NULL || type != TYPE_INVAL ||
-		    level != 0 || record.zi_cmd != ZINJECT_UNINITIALIZED) {
+		    level != 0 || record.zi_cmd != ZINJECT_UNINITIALIZED ||
+		    dvas != 0) {
 			(void) fprintf(stderr, "device (-d) incompatible with "
 			    "data error injection\n");
 			usage();
@@ -953,7 +1026,8 @@ main(int argc, char **argv)
 
 	} else if (raw != NULL) {
 		if (range != NULL || type != TYPE_INVAL || level != 0 ||
-		    record.zi_cmd != ZINJECT_UNINITIALIZED) {
+		    record.zi_cmd != ZINJECT_UNINITIALIZED ||
+		    record.zi_freq > 0 || dvas != 0) {
 			(void) fprintf(stderr, "raw (-b) format with "
 			    "any other options\n");
 			usage();
@@ -983,7 +1057,8 @@ main(int argc, char **argv)
 			error = EIO;
 	} else if (record.zi_cmd == ZINJECT_PANIC) {
 		if (raw != NULL || range != NULL || type != TYPE_INVAL ||
-		    level != 0 || device != NULL) {
+		    level != 0 || device != NULL || record.zi_freq > 0 ||
+		    dvas != 0) {
 			(void) fprintf(stderr, "panic (-p) incompatible with "
 			    "other options\n");
 			usage();
@@ -1002,6 +1077,15 @@ main(int argc, char **argv)
 			record.zi_type = atoi(argv[1]);
 		dataset[0] = '\0';
 	} else if (record.zi_cmd == ZINJECT_IGNORED_WRITES) {
+		if (raw != NULL || range != NULL || type != TYPE_INVAL ||
+		    level != 0 || record.zi_freq > 0 || dvas != 0) {
+			(void) fprintf(stderr, "hardware failure (-I) "
+			    "incompatible with other options\n");
+			usage();
+			libzfs_fini(g_zfs);
+			return (2);
+		}
+
 		if (nowrites == 0) {
 			(void) fprintf(stderr, "-s or -g meaningless "
 			    "without -I (ignore writes)\n");
@@ -1053,6 +1137,18 @@ main(int argc, char **argv)
 			(void) fprintf(stderr, "data error type must be "
 			    "'checksum' or 'io'\n");
 			return (1);
+		}
+
+		if (dvas != 0) {
+			if (error == EACCES || error == EINVAL) {
+				(void) fprintf(stderr, "the '-C' option may "
+				    "not be used with logical data errors "
+				    "'decrypt' and 'decompress'\n");
+				libzfs_fini(g_zfs);
+				return (1);
+			}
+
+			record.zi_dvas = dvas;
 		}
 
 		record.zi_cmd = ZINJECT_DATA_FAULT;


More information about the svn-src-all mailing list