[Bug 267533] Zfs Multi-Modifier Protection trigger an activity check even if it is disabled
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 267533] Zfs Multi-Modifier Protection trigger an activity check even if it is disabled"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 267533] Zfs Multi-Modifier Protection trigger an activity check even if it is disabled"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 267533] Zfs Multi-Modifier Protection trigger an activity check even if it is disabled"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 267533] Zfs Multi-Modifier Protection trigger an activity check even if it is disabled"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 02 Nov 2022 18:46:16 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267533 Bug ID: 267533 Summary: Zfs Multi-Modifier Protection trigger an activity check even if it is disabled Product: Base System Version: 12.3-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: mathieu.schmitt57@gmail.com Created attachment 237826 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=237826&action=edit git patch for spa.c We run storage clusters in production. These clusters are composed of a primary server and a secondary server. We have launched a campaign to upgrade these clusters from Freebsd 12.1 to 12.3 (amd64). The process of upgrading a whole cluster works as follow: * ensure zpools are live on primary server (12.1) * upgrade secondary server from 12.1 to 12.3 * import zpools on secondary server * upgrade primary server from 12.1 to 12.3 The issue happen when we import zpools on secondary server which has been freshly upgraded to 12.3. Off course, for data integrity reasons, we perform a zpool dry run on 12.3 (secondary server). Dry run exhibit a lot of state: UNAVAIL error on zpools. pool: ssd-00XXXXX id: 10784552113000492957 state: UNAVAIL status: The pool is currently imported by another system. action: The pool must be exported from XXXXX.XXXXXXX (hostid=9f7c3a3d) before it can be safely imported. see: http://illumos.org/msg/ZFS-8000-EY config: ssd-00XXXX UNAVAIL currently in use mirror-0 ONLINE da20 ONLINE da1 ONLINE Some logs during a zpool import: spa_load($import, config untrusted): using uberblock with txg=3144 pool last imported on non-MMP aware host using import_delay=20000000000 multihost_interval=1000000000 import_intervals=20 disk vdev '/dev/da1': best uberblock found for spa $import. txg 3144 [...] multihost activity detected txg 3144 ub_txg 3145 timestamp 1666367872 ub_timestamp 1666367903 mmp_config 0 ub_mmp_config 0 [...] We have a host with non-MMP and right after, we perform an activity checks. We believe that's the root cause is MMP (Multi-Modifier Protection). According to man zpool, (see: https://www.freebsd.org/cgi/man.cgi?query=zpool&apropos=0&sektion=8&manpath=FreeBSD+12.3-RELEASE&arch=default&format=html), multihost is set to off by default, but the log above show the opposite. In the zfs service pool allocator (spa.c), some checks are performed in the function spa_activity_check_required. One of it looks if MMP is disabled (https://cgit.freebsd.org/src/tree/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c?h=releng/12.3#n2520), and if so, the activity checks is not performed. So in our case, the function spa_ld_select_uberblock that initiate this check will indeed return with an error (EREMOTEIO): it puts the vdev in error with spa_vdev_err(rvd, VDEV_AUX_ACTIVE, EREMOTEIO). In order to get rid of this error during (our) upgrade from 12.1 to 12.3, we thing that this patch can fix the issue: diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c index 1b452501819..fa0af4636a6 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c @@ -2552,7 +2552,15 @@ spa_activity_check_required(spa_t *spa, uberblock_t *ub, nvlist_t *label, if (state != POOL_STATE_ACTIVE) return (B_FALSE); - return (B_TRUE); + /* + * Skip the activity check when a pool is imported on a + * non-aware MMP host and the pool was previously imported + * with force option. + */ + if (!MMP_VALID(ub)) + return (B_FALSE); + + return (B_TRUE); } The definition of the uberblock (https://cgit.freebsd.org/src/tree/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/uberblock_impl.h?h=releng/12.1) from 12.1 to 12.3 has changed, so when a 12.3 read uberblock initiated by 12.1, some macro are not defined especially the MMP_MAGIC. This MMP_MAGIC is defined (macro) in 12.3 to 0xa11cea11, but is not defined at all in 12.1 (so the value is set to 0000000000000000), as we can see in the log below: [-> SLAVE root@XXXX.b 03:59 PM:~]#zdb -luuuu /dev/da14 | grep mmp mmp_magic = 0000000000000000 mmp_magic = 0000000000000000 [...] This var is (probably) not well initialised, so the check which intent to determine if MMP is disabled, failed. That's why we add the check of MMP_VALID (defined in https://cgit.freebsd.org/src/tree/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/uberblock_impl.h?h=releng/12.3#n53) at the end of the sub spa_activity_check_required. Math. -- You are receiving this mail because: You are the assignee for the bug.