way for failover zpool (no HAST needed): hastmon
Mikolaj Golub
trociny at freebsd.org
Fri Apr 29 21:58:54 UTC 2011
Oops, just noticed this mail :-) Denny sent me another message privately and I
hope I answered his questions but will answer to this message too, in case
someone is interested.
On Thu, 28 Apr 2011 15:22:22 +0200 Denny Schierz wrote:
DS> hi,
DS> ok, here we go: I've installed hastmon and both FreeBSD nodes and one on
DS> Linux Debian as watchdog:
DS> Simple setup:
DS>
DS> # cat /etc.local/hastmon.conf
DS> resource sanip {
DS> exec /usr/local/_rbg/bin/san-ip
DS> friends iscsihead-m iscsihead-s nos
DS> on iscsihead-m {
DS> remote tcp4://iscsihead-s
DS> priority 0
DS> }
DS> on iscsihead-s {
DS> remote tcp4://iscsihead-m
DS> priority 1
DS> }
DS> on linux {
DS> remote tcp4://iscsihead-m tcp4://iscsihead-s
DS> }
DS> }
DS> It works only half.
DS> The simple script adds/remove an alias for the em0 and for status it
DS> does a ping -c 1 to the global ip. After tell every host, what is role
DS> is, I get on the primary "state unknown", in the secondary "state run"
DS> and watchdog for the Linux host.
It is difficult to tell without additional information what happened. It might
be that your '/usr/local/_rbg/bin/san-ip status' was returning unknown status.
In this case running manually
/usr/local/_rbg/bin/san-ip status; echo $?
might be helpful. And logs too :-).
DS> Than I rebooted the primary, the secondary take over and executed the
DS> script. After the primary was reachable again, he doesn't get the
DS> secondary role, but init/unknown.
DS> The same happens, in the opposite:
DS> from Linux:
DS> hastmonctl status
DS> sanip:
DS> role: watchdog
DS> exec: /usr/local/_rbg/bin/san-ip
DS> remote:
DS> tcp4://iscsihead-m (primary/run)
DS> tcp4://iscsihead-s (init/unknown)
DS> state: run
DS> attempts: 0 from 5
DS> complaints: 0 for last 60 sec (threshold 3)
DS> heartbeat: 10 sec
DS> from iscsihead-s:
DS> hastmonctl status
DS> sanip:
DS> role: init
DS> exec: /usr/local/_rbg/bin/san-ip
DS> remote:
DS> tcp4://iscsihead-m
DS> state: unknown
DS> attempts: 0 from 5
DS> complaints: 0 for last 60 sec (threshold 3)
DS> heartbeat: 10 sec
DS> and last from iscsihead-m
DS> hastmonctl status
DS> sanip:
DS> role: primary
DS> exec: /usr/local/_rbg/bin/san-ip
DS> remote:
DS> tcp4://iscsihead-s (disconnected)
DS> state: run
DS> attempts: 0 from 5
DS> complaints: 0 for last 60 sec (threshold 3)
DS> heartbeat: 10 sec
DS> If I take a look into the logfile from the iscsihead-m:
DS> [sanip] (primary) Remote node acts as init for the resource and not as
DS> secondary.
DS> [sanip] (primary) Handshake header from tcp4://iscsihead-s has no
DS> 'token' field.
DS> Do I have missed something?
DS> cu denny
This is expected behavior. After start hastmon is in init role. You need to
setup the role you want manually or via a startup script.
This is because you might want different configurations depending on your
requirenments:
1) After start the role is set manually by administrator (useful e.g. if you
prefer to investigate crashed host before returning it back to cluster).
2) After star the node is switched to secondary automatically (by rc script).
If all cluster nodes are configured to be in secondary on startup, and all
started simultaneously watchdog will figure out that there is no primary and
will send complaints to all secondary nodes. The nodes will be trying to
switch to master simultaneously and the node with highest priority will win.
3) One node that has highest priority configures is set on startup always to
primary. All others are to secondary.
With this configuration if the primary fails, secondary switches to primary,
then when the initial primary comes back it becomes primary again
automatically.
--
Mikolaj Golub
More information about the freebsd-stable
mailing list