From nobody Wed Feb 07 20:35:23 2024
X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TVX1l1WSrz5B2Z7
	for <freebsd-fs@mlmmj.nyi.freebsd.org>; Wed,  7 Feb 2024 20:35:39 +0000 (UTC)
	(envelope-from pete@nomadlogic.org)
Received: from mail.nomadlogic.org (mail.nomadlogic.org [66.165.241.226])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(Client did not present a certificate)
	by mx1.freebsd.org (Postfix) with ESMTPS id 4TVX1j350Sz49NV
	for <freebsd-fs@freebsd.org>; Wed,  7 Feb 2024 20:35:37 +0000 (UTC)
	(envelope-from pete@nomadlogic.org)
Authentication-Results: mx1.freebsd.org;
	dkim=pass header.d=nomadlogic.org header.s=04242021 header.b=U8LLqdVp;
	dmarc=pass (policy=quarantine) header.from=nomadlogic.org;
	spf=pass (mx1.freebsd.org: domain of pete@nomadlogic.org designates 66.165.241.226 as permitted sender) smtp.mailfrom=pete@nomadlogic.org
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nomadlogic.org;
	s=04242021; t=1707338120;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=kFNZQvyfImG0Tljd8CXfcLkPStlmfCo4ZhpzRMr0HE8=;
	b=U8LLqdVp05sqPrriOTwwHleKV2gs8TsP46L4k77Db5wNeBoDljwq2bNHyzsCr503NvG8uM
	Fq9x3wosOKcg8MHUJtQE+j3Mf+N22wgCd+EOWno8u9RxVxUqT/Gg/Ou0D6lO+O+1r0Ugyy
	DxjmxtdFCz9rm4rb495rO2H1zAULEK4=
Received: from [192.168.1.160] (<unknown> [47.150.83.63])
	by mail.nomadlogic.org (OpenSMTPD) with ESMTPSA id 34c5fc18 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO)
	for <freebsd-fs@freebsd.org>;
	Wed, 7 Feb 2024 20:35:19 +0000 (UTC)
Message-ID: <61dbd87f-2be5-4515-8a93-8656b114cd8e@nomadlogic.org>
Date: Wed, 7 Feb 2024 12:35:23 -0800
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-fs
List-Help: <mailto:freebsd-fs+help@freebsd.org>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Subscribe: <mailto:freebsd-fs+subscribe@freebsd.org>
List-Unsubscribe: <mailto:freebsd-fs+unsubscribe@freebsd.org>
Sender: owner-freebsd-fs@freebsd.org
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: ZFS on a shared iSCSI
Content-Language: en-US
To: freebsd-fs@freebsd.org
References: <CAGqQmRPssfrb9S6r0H4SaQLEfOTp5Qx7-HSvHtYkxMnpeqKWjQ@mail.gmail.com>
From: Pete Wright <pete@nomadlogic.org>
In-Reply-To: <CAGqQmRPssfrb9S6r0H4SaQLEfOTp5Qx7-HSvHtYkxMnpeqKWjQ@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spamd-Bar: ---
X-Spamd-Result: default: False [-3.98 / 15.00];
	NEURAL_HAM_LONG(-1.00)[-1.000];
	NEURAL_HAM_SHORT(-1.00)[-1.000];
	NEURAL_HAM_MEDIUM(-0.99)[-0.995];
	DMARC_POLICY_ALLOW(-0.50)[nomadlogic.org,quarantine];
	R_DKIM_ALLOW(-0.20)[nomadlogic.org:s=04242021];
	R_SPF_ALLOW(-0.20)[+mx];
	MIME_GOOD(-0.10)[text/plain];
	XM_UA_NO_VERSION(0.01)[];
	RCVD_VIA_SMTP_AUTH(0.00)[];
	ASN(0.00)[asn:29802, ipnet:66.165.240.0/22, country:US];
	RCVD_COUNT_ONE(0.00)[1];
	RCPT_COUNT_ONE(0.00)[1];
	MIME_TRACE(0.00)[0:+];
	RCVD_TLS_ALL(0.00)[];
	MLMMJ_DEST(0.00)[freebsd-fs@freebsd.org];
	ARC_NA(0.00)[];
	FROM_EQ_ENVFROM(0.00)[];
	FROM_HAS_DN(0.00)[];
	MID_RHS_MATCH_FROM(0.00)[];
	TO_DN_NONE(0.00)[];
	PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org];
	TO_MATCH_ENVRCPT_ALL(0.00)[];
	DKIM_TRACE(0.00)[nomadlogic.org:+]
X-Rspamd-Queue-Id: 4TVX1j350Sz49NV


On 2/7/24 02:55, Andrea Brancatelli wrote:
> Hello guys, I'm not 100% this is the correct list to ask this 
> question, if not please feel free to point me in the right direction.
>
> I was wondering what could be the best recipe to have an HA cluster 
> sharing an external ZFS storage.
>
> Let's say I have two servers running a bunch of Jails and, thus, I'd 
> like to use ZFS as the underlying storage layer and I have an external 
> (iSCSI) storage connected.
>
> Would it be "easily possible" to have some (2?) iSCSI LUN exposed to 
> both servers and then activate the pool on one or the other server?
>
> The idea would be to reactivate the filesystem from server A on server 
> B if the server A fails.
>
> Would it be "easier" to replicate everything and zfs send datas back 
> and forth? Clearly that would mean doubling datas and havin a 
> scheduled replica with a possible delay in data replication, so I'd 
> like to avoid this.
>
You could probably roll your own solution using corosync and pacemaker, 
possibly in addition to using HAST to replicate blocks between your 
LUNs.  i would avoid trying to do ZFS replication in this scenario.


the tl;dr could look like:

- HAST replicates blocks between iSCSI LUNs (assuming your vendor 
doesn't already support this on the target side, many of the enterprise 
vendors should provide this for you IMHO).

- corosync/pacemaker are used to detect health of each of your freebsd 
systems.  if a  heartbeat fails between nodes it can trigger a failover 
event automatically.

- the failover event would mount the LUN on the healthy box and do other 
housekeeping (failing over IPs maybe?, restarting jails?)

i've actually build a system using corosync to do failover in AWS, and 
one of the nice things with it is when a failover event is triggered you 
can run arbitrary scripts.  so in my use case i was able to interact 
with the AWS EC2 API via some scripts to migrate network devices from 
one instance to another.  it seems pretty reliable, and handles some 
critical infrastructure for us.

but to get this all right is pretty complicated...but so is distributed 
computing in general and i'd be suspicious of any vendor who says they 
can make this simple :)

regardless of your approach you'd need to do a lot of testing and 
monitoring for critical production use.  it all comes down to what 
amount of resources you want to put into this.

-pete


-- 
Pete Wright
pete@nomadlogic.org