Re: ZFS on a shared iSCSI
- In reply to: Andrea Brancatelli : "ZFS on a shared iSCSI"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 07 Feb 2024 20:35:23 UTC
On 2/7/24 02:55, Andrea Brancatelli wrote: > Hello guys, I'm not 100% this is the correct list to ask this > question, if not please feel free to point me in the right direction. > > I was wondering what could be the best recipe to have an HA cluster > sharing an external ZFS storage. > > Let's say I have two servers running a bunch of Jails and, thus, I'd > like to use ZFS as the underlying storage layer and I have an external > (iSCSI) storage connected. > > Would it be "easily possible" to have some (2?) iSCSI LUN exposed to > both servers and then activate the pool on one or the other server? > > The idea would be to reactivate the filesystem from server A on server > B if the server A fails. > > Would it be "easier" to replicate everything and zfs send datas back > and forth? Clearly that would mean doubling datas and havin a > scheduled replica with a possible delay in data replication, so I'd > like to avoid this. > You could probably roll your own solution using corosync and pacemaker, possibly in addition to using HAST to replicate blocks between your LUNs. i would avoid trying to do ZFS replication in this scenario. the tl;dr could look like: - HAST replicates blocks between iSCSI LUNs (assuming your vendor doesn't already support this on the target side, many of the enterprise vendors should provide this for you IMHO). - corosync/pacemaker are used to detect health of each of your freebsd systems. if a heartbeat fails between nodes it can trigger a failover event automatically. - the failover event would mount the LUN on the healthy box and do other housekeeping (failing over IPs maybe?, restarting jails?) i've actually build a system using corosync to do failover in AWS, and one of the nice things with it is when a failover event is triggered you can run arbitrary scripts. so in my use case i was able to interact with the AWS EC2 API via some scripts to migrate network devices from one instance to another. it seems pretty reliable, and handles some critical infrastructure for us. but to get this all right is pretty complicated...but so is distributed computing in general and i'd be suspicious of any vendor who says they can make this simple :) regardless of your approach you'd need to do a lot of testing and monitoring for critical production use. it all comes down to what amount of resources you want to put into this. -pete -- Pete Wright pete@nomadlogic.org