[Bug 217313] net/libzmq4: EHOSTDOWN from getsockopt must not cause assertion abort; causes SaltStack crashes

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Thu Feb 23 17:23:33 UTC 2017


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217313

            Bug ID: 217313
           Summary: net/libzmq4: EHOSTDOWN from getsockopt must not cause
                    assertion abort; causes SaltStack crashes
           Product: Ports & Packages
           Version: Latest
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: Individual Port(s)
          Assignee: koobs at FreeBSD.org
          Reporter: Mark.Martinec at ijs.si
          Assignee: koobs at FreeBSD.org
             Flags: maintainer-feedback?(koobs at FreeBSD.org)

Created attachment 180246
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=180246&action=edit
Adds a missing "errno == EHOSTDOWN" exemption to assert

Environment: 11.0-RELEASE-p7 amd64, zeromq-4.1.5, py27-salt-2016.11.2 .

For some time I have been observing instability of a running salt-minion
on several of our hosts: after a couple of days Salt master loses
contact with most FreeBSD hosts, but Linux hosts remain connected.
After such event one can find a coredump of a salt-minion process
(written in python) on such systems, and manually restarting a
salt_minion is needed.

After adding some debugging, and capturing stdout and stderr of
a non-daemonized salt-minion process, the following is the last
reported line:

  Host is down (src/tcp_connecter.cpp:359)

So it seems there was some minor network glitch or outage, and
libzmq decided to abort the process with an assertion, which is
supposed to exempt all usual network socket -related problems
and catch only a potential application problem.

The problem is that the exemption list is missing the EHOSTDOWN
error code, which should not be a cause of a process abort.

The essential code snippet is shown here:


/usr/ports/net/libzmq4/work/zeromq-4.1.5/src/tcp_connecter.cpp :

zmq::fd_t zmq::tcp_connecter_t::connect ()
{
    //  Async connect has finished. Check whether an error occurred
    int err = 0;
    socklen_t len = sizeof err;

    const int rc = getsockopt (s, SOL_SOCKET, SO_ERROR, (char*) &err, &len);

    //  Assert if the error was caused by 0MQ bug.
    //  Networking problems are OK. No need to assert.
[...]
    //  Following code should handle both Berkeley-derived socket
    //  implementations and Solaris.
    if (rc == -1)
        err = errno;
    if (err != 0) {
        errno = err;
        errno_assert (
            errno == ECONNREFUSED ||
            errno == ECONNRESET ||
            errno == ETIMEDOUT ||
            errno == EHOSTUNREACH ||
  +         errno == EHOSTDOWN ||
            errno == ENETUNREACH ||
            errno == ENETDOWN ||
            errno == EINVAL);
        return retired_fd;
    }


So, please add the missing "errno == EHOSTDOWN ||" exemption
to the errno_assert list.

A patch is included.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-ports-bugs mailing list