[Bug 217313] net/libzmq4: EHOSTDOWN from getsockopt must not cause assertion abort; causes SaltStack crashes
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Thu Feb 23 17:23:33 UTC 2017
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217313
Bug ID: 217313
Summary: net/libzmq4: EHOSTDOWN from getsockopt must not cause
assertion abort; causes SaltStack crashes
Product: Ports & Packages
Version: Latest
Hardware: Any
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: Individual Port(s)
Assignee: koobs at FreeBSD.org
Reporter: Mark.Martinec at ijs.si
Assignee: koobs at FreeBSD.org
Flags: maintainer-feedback?(koobs at FreeBSD.org)
Created attachment 180246
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=180246&action=edit
Adds a missing "errno == EHOSTDOWN" exemption to assert
Environment: 11.0-RELEASE-p7 amd64, zeromq-4.1.5, py27-salt-2016.11.2 .
For some time I have been observing instability of a running salt-minion
on several of our hosts: after a couple of days Salt master loses
contact with most FreeBSD hosts, but Linux hosts remain connected.
After such event one can find a coredump of a salt-minion process
(written in python) on such systems, and manually restarting a
salt_minion is needed.
After adding some debugging, and capturing stdout and stderr of
a non-daemonized salt-minion process, the following is the last
reported line:
Host is down (src/tcp_connecter.cpp:359)
So it seems there was some minor network glitch or outage, and
libzmq decided to abort the process with an assertion, which is
supposed to exempt all usual network socket -related problems
and catch only a potential application problem.
The problem is that the exemption list is missing the EHOSTDOWN
error code, which should not be a cause of a process abort.
The essential code snippet is shown here:
/usr/ports/net/libzmq4/work/zeromq-4.1.5/src/tcp_connecter.cpp :
zmq::fd_t zmq::tcp_connecter_t::connect ()
{
// Async connect has finished. Check whether an error occurred
int err = 0;
socklen_t len = sizeof err;
const int rc = getsockopt (s, SOL_SOCKET, SO_ERROR, (char*) &err, &len);
// Assert if the error was caused by 0MQ bug.
// Networking problems are OK. No need to assert.
[...]
// Following code should handle both Berkeley-derived socket
// implementations and Solaris.
if (rc == -1)
err = errno;
if (err != 0) {
errno = err;
errno_assert (
errno == ECONNREFUSED ||
errno == ECONNRESET ||
errno == ETIMEDOUT ||
errno == EHOSTUNREACH ||
+ errno == EHOSTDOWN ||
errno == ENETUNREACH ||
errno == ENETDOWN ||
errno == EINVAL);
return retired_fd;
}
So, please add the missing "errno == EHOSTDOWN ||" exemption
to the errno_assert list.
A patch is included.
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-ports-bugs
mailing list