From nobody Wed Sep 06 21:49:21 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Rgwxt0JfGz4sDd2; Wed, 6 Sep 2023 21:49:22 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Rgwxs6dsLz3Xj7; Wed, 6 Sep 2023 21:49:21 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1694036961; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=KyUjFdIZE7sszEMqcOXBEKNwC9Mfzvti9t4ecnve4Vc=; b=XOa8kTLtc1mwdx03UBHwR+ADb0t7C+kXJuO9Qa1N1DS43Vx9HA+7w0aROQ0DzXQ1sgDs3f yG3JN5DjURmkBZsDm0AbSBrs1MX6DwVSAUzb6BzM/7r1SPHGkMhBmfEkuu21dglYMYW4DW Y661xrmxj1ic1XPVvLlIZdbYxHMhDBzwexY67jje+ADnB93S47VZCzCJaDpoaFBeFIBMXO Xf6NfXGtATkJEaaOOIJnvLaEzcDijnH9KcpwDdx/WCVn18TrDlJUYiz3tyQhm8JcmFxzaL KW1SASQCvrVbWzWvYJlGfYUAg7dipWBwIPrK4PZMNgQ1UrPdKN3x3fqUAnOQJg== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1694036961; a=rsa-sha256; cv=none; b=blCfkO2Oz/7KFY9eFxFbxG7dwdHJMZGMAIGkvypI+FbHMGIadliYj5mjds0w0SHSBLBiZP z2enjnelF5oE2lmXCCD5ULjLdMyeSnm43pcwbn0F5/v2EV/a0xM385YEf7FwRZyDxK9xYL bAhk+ZjJdUgeDDZ0ADqq5ZJmJ54Hv55L3Fn/QqTBMOuIq1ys/rdYOLseiSTcBOYW9cttSz xTKWGcX29f+nW8rU1TBFHOtshAU5XPCObm7PVZFPclNyiAauLubn/L2imEwiC8BXVw+LXQ T2RRivymb0wLIoBnjcPuw07OAhoAid4bkSZLQhBhfjncdXwGJ0xsxiTK+QNxQg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1694036961; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=KyUjFdIZE7sszEMqcOXBEKNwC9Mfzvti9t4ecnve4Vc=; b=xV70xRPe64yjC9KR83UIaQbCajGT6jclUJI7mNEb7hq58nTaNLhCZNjRNy1ttQifmcPMQQ pDfQvV9Sk4MqzATBTBSo3l74aoDmz9kqopPWK1lAVz85bZBW7wl3zGyakYY2KftbY09SXL /tjboWjxTk8GdCsb9M7eQm12EVFLRkTszxY5/iA9kSpBHvDWrTeZRKmWS5kIY6nhWsUtTZ 2Era72cXem8XMkgw8EhxD018B/Fa5CQJ1qCBvkkDVjXxbfbbDTvvnO0U8OyJCLPVsgU8Rf DTxSqalC0oIa3cZ3v5DxaYFuTA4VhoFufZWE7loifPXtHtbSWIxDpxcsOR45Mw== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4Rgwxs5kk2zXxj; Wed, 6 Sep 2023 21:49:21 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.17.1/8.17.1) with ESMTP id 386LnLgs005696; Wed, 6 Sep 2023 21:49:21 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.17.1/8.17.1/Submit) id 386LnLXD005693; Wed, 6 Sep 2023 21:49:21 GMT (envelope-from git) Date: Wed, 6 Sep 2023 21:49:21 GMT Message-Id: <202309062149.386LnLXD005693@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Alan Somers Subject: git: f85da5f88efc - stable/13 - Multiple fixes to the zfsd test suite List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: asomers X-Git-Repository: src X-Git-Refname: refs/heads/stable/13 X-Git-Reftype: branch X-Git-Commit: f85da5f88efce6177d0e5be2b74c48599e5c471a Auto-Submitted: auto-generated The branch stable/13 has been updated by asomers: URL: https://cgit.FreeBSD.org/src/commit/?id=f85da5f88efce6177d0e5be2b74c48599e5c471a commit f85da5f88efce6177d0e5be2b74c48599e5c471a Author: Alan Somers AuthorDate: 2023-04-03 21:43:17 +0000 Commit: Alan Somers CommitDate: 2023-09-06 21:49:06 +0000 Multiple fixes to the zfsd test suite * Wait for gnop devices to disappear after "gnop destroy". Apparently that process is asynchronous now, or maybe it's just slower than it used to be. Also, after removing a gnop wait for its pool to be degraded. That isn't instant. * The zfsd tests no longer require camcontrol. This was a harmless oversight from 11ed0a95bfa76791dc6428eb2d47a986c0c6f8a3 * Fix the zfsd_degrade_001_pos test for recent zfs versions. ZFS now rate limits checksum errors to about 20 per second. But zfsd's threshold for degrading a disk is 50 per minute. So we must alternately corrupt and scrub the pool to ensure that checksum errors are generated in multiple 1-second windows, so that zfsd will see enough of them. * Fix the zfsd_fault_001_pos test in VMs And, for that matter, when using NVME or SATA disks. As originally written, the test used the da driver to inject errors. Rewrite it to use gnop vdevs. gnop can also inject errors. It works on top of any disk device, and it's also faster than using da. Sponsored by: Axcient Differential Revision: https://reviews.freebsd.org/D39437 (cherry picked from commit dba2e89ea7a13469ee2e47a2a1d627ca28bb94c2) --- tests/sys/cddl/zfs/include/libgnop.kshlib | 6 ++ tests/sys/cddl/zfs/tests/zfsd/zfsd.kshlib | 29 ++++++--- .../zfs/tests/zfsd/zfsd_autoreplace_002_pos.ksh | 1 + .../zfs/tests/zfsd/zfsd_autoreplace_003_pos.ksh | 1 + .../sys/cddl/zfs/tests/zfsd/zfsd_fault_001_pos.ksh | 69 ++++------------------ .../cddl/zfs/tests/zfsd/zfsd_replace_001_pos.ksh | 1 + tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh | 18 +++--- 7 files changed, 53 insertions(+), 72 deletions(-) diff --git a/tests/sys/cddl/zfs/include/libgnop.kshlib b/tests/sys/cddl/zfs/include/libgnop.kshlib index 44809385c075..f4f742fe6929 100644 --- a/tests/sys/cddl/zfs/include/libgnop.kshlib +++ b/tests/sys/cddl/zfs/include/libgnop.kshlib @@ -84,6 +84,12 @@ function destroy_gnop # Use "-f" so we can destroy a gnop with a consumer (like ZFS) gnop destroy -f ${disk}.nop + + # Wait for it to disappear + for i in `seq 5`; do + gnop status ${disk}.nop >/dev/null 2>/dev/null || break + sleep $i + done } # Destroy multiple gnop devices. Attempt to destroy them all, ignoring errors diff --git a/tests/sys/cddl/zfs/tests/zfsd/zfsd.kshlib b/tests/sys/cddl/zfs/tests/zfsd/zfsd.kshlib index e9ea036fbbab..8456c5450d2b 100644 --- a/tests/sys/cddl/zfs/tests/zfsd/zfsd.kshlib +++ b/tests/sys/cddl/zfs/tests/zfsd/zfsd.kshlib @@ -65,19 +65,32 @@ function corrupt_pool_vdev typeset pool=$1 typeset vdev=$2 typeset file=$3 + typeset -li start=0 + typeset -li now=0 + typeset -li timeout=60 # do some IO on the pool log_must $DD if=/dev/zero of=$file bs=1024k count=64 $FSYNC $file - # scribble on the underlying file to corrupt the vdev - log_must $DD if=/dev/urandom of=$vdev bs=1024k count=64 conv=notrunc + # ZFS rate limits checksum errors to about 20 per second. So in order + # to ensure that we reach zfsd's threshold, we must alternately + # scribble and scrub. + while (( "$now" - "$start" < "$timeout" )); do + # scribble on the underlying file to corrupt the vdev + log_must $DD if=/dev/urandom of=$vdev bs=1024k count=64 conv=notrunc - # Scrub the pool to detect the corruption - log_must $ZPOOL scrub $pool - wait_until_scrubbed $pool + # Scrub the pool to detect and repair the corruption + log_must $ZPOOL scrub $pool + wait_until_scrubbed $pool + now=`date +%s` + if [ "$start" -eq 0 ]; then + start=`date +%s` + fi + check_state "$pool" "$vdev" DEGRADED && return + $SLEEP 1 + done - # ZFSD can take up to 60 seconds to degrade an array in response to - # errors (though it's usually faster). - wait_for_pool_dev_state_change 60 $vdev DEGRADED + log_must $ZPOOL status "$pool" + log_fail "ERROR: Disk $vdev not marked as DEGRADED in $pool" } diff --git a/tests/sys/cddl/zfs/tests/zfsd/zfsd_autoreplace_002_pos.ksh b/tests/sys/cddl/zfs/tests/zfsd/zfsd_autoreplace_002_pos.ksh index 2d50c73844a5..6d009a9a8b56 100644 --- a/tests/sys/cddl/zfs/tests/zfsd/zfsd_autoreplace_002_pos.ksh +++ b/tests/sys/cddl/zfs/tests/zfsd/zfsd_autoreplace_002_pos.ksh @@ -81,6 +81,7 @@ for keyword in "${MY_KEYWORDS[@]}" ; do log_must $ZPOOL set autoreplace=on $TESTPOOL log_must destroy_gnop $REMOVAL_DISK + log_must wait_for_pool_removal 20 log_must create_gnop $NEW_DISK $PHYSPATH verify_assertion destroy_pool "$TESTPOOL" diff --git a/tests/sys/cddl/zfs/tests/zfsd/zfsd_autoreplace_003_pos.ksh b/tests/sys/cddl/zfs/tests/zfsd/zfsd_autoreplace_003_pos.ksh index e2af801558e0..4eb04d60809e 100644 --- a/tests/sys/cddl/zfs/tests/zfsd/zfsd_autoreplace_003_pos.ksh +++ b/tests/sys/cddl/zfs/tests/zfsd/zfsd_autoreplace_003_pos.ksh @@ -91,6 +91,7 @@ for keyword in "${MY_KEYWORDS[@]}" ; do log_must $ZPOOL set autoreplace=on $TESTPOOL log_must destroy_gnop $REMOVAL_DISK + log_must wait_for_pool_removal 20 log_must create_gnop $NEW_DISK $PHYSPATH verify_assertion destroy_pool "$TESTPOOL" diff --git a/tests/sys/cddl/zfs/tests/zfsd/zfsd_fault_001_pos.ksh b/tests/sys/cddl/zfs/tests/zfsd/zfsd_fault_001_pos.ksh index 3e1340b22e56..3456a328e7f9 100644 --- a/tests/sys/cddl/zfs/tests/zfsd/zfsd_fault_001_pos.ksh +++ b/tests/sys/cddl/zfs/tests/zfsd/zfsd_fault_001_pos.ksh @@ -26,6 +26,7 @@ # . $STF_SUITE/include/libtest.kshlib +. $STF_SUITE/include/libgnop.kshlib ################################################################################ # @@ -38,8 +39,7 @@ # # # STRATEGY: -# 1. Create a storage pool. Only use the da driver (FreeBSD's SCSI disk -# driver) because it has a special interface for simulating IO errors. +# 1. Create a storage pool. Use gnop vdevs so we can inject I/O errors. # 2. Inject IO errors while doing IO to the pool. # 3. Verify that the vdev becomes FAULTED. # 4. ONLINE it and verify that it resilvers and joins the pool. @@ -56,65 +56,28 @@ verify_runnable "global" -function cleanup -{ - # Disable error injection, if still active - sysctl kern.cam.da.$TMPDISKNUM.error_inject=0 > /dev/null - - if poolexists $TESTPOOL; then - # We should not get here if the test passed. Print the output - # of zpool status to assist in debugging. - $ZPOOL status - # Clear out artificially generated errors and destroy the pool - $ZPOOL clear $TESTPOOL - destroy_pool $TESTPOOL - fi -} - log_assert "ZFS will fault a vdev that produces IO errors" -log_onexit cleanup ensure_zfsd_running -# Make sure that at least one of the disks is using the da driver, and use -# that disk for inject errors -typeset TMPDISK="" -for d in $DISKS -do - b=`basename $d` - if test ${b%%[0-9]*} == da - then - TMPDISK=$b - TMPDISKNUM=${b##da} - break - fi -done -if test -z $TMPDISK -then - log_unsupported "This test requires at least one disk to use the da driver" -fi +DISK0_NOP=${DISK0}.nop +DISK1_NOP=${DISK1}.nop +log_must create_gnops $DISK0 $DISK1 for type in "raidz" "mirror"; do log_note "Testing raid type $type" # Create a pool on the supplied disks - create_pool $TESTPOOL $type $DISKS + create_pool $TESTPOOL $type "$DISK0_NOP" "$DISK1_NOP" log_must $ZFS create $TESTPOOL/$TESTFS # Cause some IO errors writing to the pool while true; do - # Running zpool status after every dd operation is too slow. - # So we will run several dd's in a row before checking zpool - # status. sync between dd operations to ensure that the disk - # gets IO - for ((i=0; $i<64; i=$i+1)); do - sysctl kern.cam.da.$TMPDISKNUM.error_inject=1 > \ - /dev/null - $DD if=/dev/zero bs=128k count=1 >> \ - /$TESTPOOL/$TESTFS/$TESTFILE 2> /dev/null - $FSYNC /$TESTPOOL/$TESTFS/$TESTFILE - done + log_must gnop configure -e 5 -w 100 "$DISK1_NOP" + $DD if=/dev/zero bs=128k count=1 >> \ + /$TESTPOOL/$TESTFS/$TESTFILE 2> /dev/null + $FSYNC /$TESTPOOL/$TESTFS/$TESTFILE # Check to see if the pool is faulted yet $ZPOOL status $TESTPOOL | grep -q 'state: DEGRADED' if [ $? == 0 ] @@ -126,15 +89,9 @@ for type in "raidz" "mirror"; do log_must check_state $TESTPOOL $TMPDISK "FAULTED" - #find the failed disk guid - typeset FAILED_VDEV=`$ZPOOL status $TESTPOOL | - awk "/^[[:space:]]*$TMPDISK[[:space:]]*FAULTED/ {print \\$1}"` - - # Reattach the failed disk - $ZPOOL online $TESTPOOL $FAILED_VDEV > /dev/null - if [ $? != 0 ]; then - log_fail "Could not reattach $FAILED_VDEV" - fi + # Heal and reattach the failed disk + log_must gnop configure -w 0 "$DISK1_NOP" + log_must $ZPOOL online $TESTPOOL "$DISK1_NOP" # Verify that the pool resilvers and goes to the ONLINE state for (( retries=60; $retries>0; retries=$retries+1 )) diff --git a/tests/sys/cddl/zfs/tests/zfsd/zfsd_replace_001_pos.ksh b/tests/sys/cddl/zfs/tests/zfsd/zfsd_replace_001_pos.ksh index dd39d90fd694..a94a3fb7ac42 100644 --- a/tests/sys/cddl/zfs/tests/zfsd/zfsd_replace_001_pos.ksh +++ b/tests/sys/cddl/zfs/tests/zfsd/zfsd_replace_001_pos.ksh @@ -57,6 +57,7 @@ for type in "raidz" "mirror"; do # Disable the first disk. log_must destroy_gnop $REMOVAL_DISK + log_must wait_for_pool_removal 20 # Write out data to make sure we can do I/O after the disk failure log_must $DD if=/dev/zero of=$TESTDIR/$TESTFILE bs=1m count=1 diff --git a/tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh b/tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh index 7f066a3cff21..b6dcfe97dd7b 100755 --- a/tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh +++ b/tests/sys/cddl/zfs/tests/zfsd/zfsd_test.sh @@ -28,12 +28,14 @@ atf_test_case zfsd_fault_001_pos cleanup zfsd_fault_001_pos_head() { atf_set "descr" "ZFS will fault a vdev that produces IO errors" - atf_set "require.progs" "ksh93 zfs zpool zfsd" + atf_set "require.progs" "ksh93 gnop zfs zpool zfsd" atf_set "timeout" 300 } zfsd_fault_001_pos_body() { . $(atf_get_srcdir)/../../include/default.cfg + . $(atf_get_srcdir)/../hotspare/hotspare.kshlib + . $(atf_get_srcdir)/../hotspare/hotspare.cfg . $(atf_get_srcdir)/zfsd.cfg verify_disk_count "$DISKS" 2 @@ -210,7 +212,7 @@ atf_test_case zfsd_hotspare_004_pos cleanup zfsd_hotspare_004_pos_head() { atf_set "descr" "Removing a disk from a pool results in the spare activating" - atf_set "require.progs" "ksh93 gnop zpool camcontrol zfsd" + atf_set "require.progs" "ksh93 gnop zpool" atf_set "timeout" 3600 } zfsd_hotspare_004_pos_body() @@ -301,7 +303,7 @@ atf_test_case zfsd_hotspare_007_pos cleanup zfsd_hotspare_007_pos_head() { atf_set "descr" "zfsd will swap failed drives at startup" - atf_set "require.progs" "ksh93 gnop zpool camcontrol zfsd" + atf_set "require.progs" "ksh93 gnop zpool" atf_set "timeout" 3600 } zfsd_hotspare_007_pos_body() @@ -362,7 +364,7 @@ atf_test_case zfsd_autoreplace_001_neg cleanup zfsd_autoreplace_001_neg_head() { atf_set "descr" "A pool without autoreplace set will not replace by physical path" - atf_set "require.progs" "ksh93 zpool camcontrol zfsd gnop" + atf_set "require.progs" "ksh93 zpool gnop" atf_set "timeout" 3600 } zfsd_autoreplace_001_neg_body() @@ -423,7 +425,7 @@ atf_test_case zfsd_autoreplace_003_pos cleanup zfsd_autoreplace_003_pos_head() { atf_set "descr" "A pool with autoreplace set will replace by physical path even if a spare is active" - atf_set "require.progs" "ksh93 zpool camcontrol zfsd gnop" + atf_set "require.progs" "ksh93 zpool gnop" atf_set "timeout" 3600 } zfsd_autoreplace_003_pos_body() @@ -454,7 +456,7 @@ atf_test_case zfsd_replace_001_pos cleanup zfsd_replace_001_pos_head() { atf_set "descr" "ZFSD will automatically replace a SAS disk that disappears and reappears in the same location, with the same devname" - atf_set "require.progs" "ksh93 zpool camcontrol zfsd zfs gnop" + atf_set "require.progs" "ksh93 zpool zfs gnop" } zfsd_replace_001_pos_body() { @@ -483,7 +485,7 @@ atf_test_case zfsd_replace_002_pos cleanup zfsd_replace_002_pos_head() { atf_set "descr" "zfsd will reactivate a pool after all disks are failed and reappeared" - atf_set "require.progs" "ksh93 zpool camcontrol zfsd zfs" + atf_set "require.progs" "ksh93 zpool zfs" } zfsd_replace_002_pos_body() { @@ -512,7 +514,7 @@ atf_test_case zfsd_replace_003_pos cleanup zfsd_replace_003_pos_head() { atf_set "descr" "ZFSD will correctly replace disks that dissapear and reappear with different devnames" - atf_set "require.progs" "ksh93 zpool camcontrol zfsd zfs gnop" + atf_set "require.progs" "ksh93 zpool zfs gnop" } zfsd_replace_003_pos_body() {