From nobody Wed Feb 07 20:35:23 2024 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TVX1l1WSrz5B2Z7 for ; Wed, 7 Feb 2024 20:35:39 +0000 (UTC) (envelope-from pete@nomadlogic.org) Received: from mail.nomadlogic.org (mail.nomadlogic.org [66.165.241.226]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4TVX1j350Sz49NV for ; Wed, 7 Feb 2024 20:35:37 +0000 (UTC) (envelope-from pete@nomadlogic.org) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=nomadlogic.org header.s=04242021 header.b=U8LLqdVp; dmarc=pass (policy=quarantine) header.from=nomadlogic.org; spf=pass (mx1.freebsd.org: domain of pete@nomadlogic.org designates 66.165.241.226 as permitted sender) smtp.mailfrom=pete@nomadlogic.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nomadlogic.org; s=04242021; t=1707338120; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kFNZQvyfImG0Tljd8CXfcLkPStlmfCo4ZhpzRMr0HE8=; b=U8LLqdVp05sqPrriOTwwHleKV2gs8TsP46L4k77Db5wNeBoDljwq2bNHyzsCr503NvG8uM Fq9x3wosOKcg8MHUJtQE+j3Mf+N22wgCd+EOWno8u9RxVxUqT/Gg/Ou0D6lO+O+1r0Ugyy DxjmxtdFCz9rm4rb495rO2H1zAULEK4= Received: from [192.168.1.160] ( [47.150.83.63]) by mail.nomadlogic.org (OpenSMTPD) with ESMTPSA id 34c5fc18 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Wed, 7 Feb 2024 20:35:19 +0000 (UTC) Message-ID: <61dbd87f-2be5-4515-8a93-8656b114cd8e@nomadlogic.org> Date: Wed, 7 Feb 2024 12:35:23 -0800 List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: ZFS on a shared iSCSI Content-Language: en-US To: freebsd-fs@freebsd.org References: From: Pete Wright In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.98 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-0.99)[-0.995]; DMARC_POLICY_ALLOW(-0.50)[nomadlogic.org,quarantine]; R_DKIM_ALLOW(-0.20)[nomadlogic.org:s=04242021]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; XM_UA_NO_VERSION(0.01)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; ASN(0.00)[asn:29802, ipnet:66.165.240.0/22, country:US]; RCVD_COUNT_ONE(0.00)[1]; RCPT_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_ALL(0.00)[]; MLMMJ_DEST(0.00)[freebsd-fs@freebsd.org]; ARC_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[nomadlogic.org:+] X-Rspamd-Queue-Id: 4TVX1j350Sz49NV On 2/7/24 02:55, Andrea Brancatelli wrote: > Hello guys, I'm not 100% this is the correct list to ask this > question, if not please feel free to point me in the right direction. > > I was wondering what could be the best recipe to have an HA cluster > sharing an external ZFS storage. > > Let's say I have two servers running a bunch of Jails and, thus, I'd > like to use ZFS as the underlying storage layer and I have an external > (iSCSI) storage connected. > > Would it be "easily possible" to have some (2?) iSCSI LUN exposed to > both servers and then activate the pool on one or the other server? > > The idea would be to reactivate the filesystem from server A on server > B if the server A fails. > > Would it be "easier" to replicate everything and zfs send datas back > and forth? Clearly that would mean doubling datas and havin a > scheduled replica with a possible delay in data replication, so I'd > like to avoid this. > You could probably roll your own solution using corosync and pacemaker, possibly in addition to using HAST to replicate blocks between your LUNs.  i would avoid trying to do ZFS replication in this scenario. the tl;dr could look like: - HAST replicates blocks between iSCSI LUNs (assuming your vendor doesn't already support this on the target side, many of the enterprise vendors should provide this for you IMHO). - corosync/pacemaker are used to detect health of each of your freebsd systems.  if a  heartbeat fails between nodes it can trigger a failover event automatically. - the failover event would mount the LUN on the healthy box and do other housekeeping (failing over IPs maybe?, restarting jails?) i've actually build a system using corosync to do failover in AWS, and one of the nice things with it is when a failover event is triggered you can run arbitrary scripts.  so in my use case i was able to interact with the AWS EC2 API via some scripts to migrate network devices from one instance to another.  it seems pretty reliable, and handles some critical infrastructure for us. but to get this all right is pretty complicated...but so is distributed computing in general and i'd be suspicious of any vendor who says they can make this simple :) regardless of your approach you'd need to do a lot of testing and monitoring for critical production use.  it all comes down to what amount of resources you want to put into this. -pete -- Pete Wright pete@nomadlogic.org