the switching time hastd from secondary to primary

Sat Jan 16 18:13:12 UTC 2016

On Thu, Jan 14, 2016 at 02:23:46PM +0400, Shahin Hasanov wrote:

> In /usr/local/sbin/ucarp_up.sh(below shown extract of it) script
> ucarp waiting while it became primary. It tooks about 20 sec as
> written
> http://www.freebsd.org/cgi/man.cgi?query=hast.conf&apropos=0&sektion=0&manpath=FreeBSD+10.2-RELEASE&arch=default&format=html
>  .
> for i in `jot 30`; do
>         pgrep -f "hastd: ${resource} \(secondary\)" >/dev/null 2>&1 || break
>         sleep 1
> done
> if pgrep -f "hastd: ${resource} \(secondary\)" >/dev/null 2>&1; then
>         logger -p local0.error -t hast "Secondary process for resource ${resource} is still running after 30 seconds."
>         exit 1
> fi

Looking at the logs would be nice. But I guess you are hitting here
timeout in the thread waiting for incoming data from primary. This
timeout is 2 * HAST_KEEPALIVE, and HAST_KEEPALIVE is hardcoded to 10
sec.

So right now it can be changed only by recompiling hastd. On the other
hand, hitting this timeout means that the connection was not closed
properly, so it is not a case, I would expected for "planned"
failovering, when the role is changed using `hastctl role`
commands. This looks like rather a case of disaster recovery after
networking partitioning, host crash, hang, etc.. In my opinion waiting
for 20 sec is not bad comparing with possibility to have split-brain
if the former primary is still alive.

If you observe 20 sec timeout when doing "planned" failovering, I
guess there is something wrong with the scripts that do switching.

-- 
Mykola Golub