Listen queue overflow: N already in queue awaiting acceptance
Gleb Smirnoff
glebius at FreeBSD.org
Thu Jul 11 14:52:31 UTC 2013
On Thu, Jul 11, 2013 at 04:49:25PM +0200, Luigi Rizzo wrote:
L> >> IMO, this should be a single counter accessible via sysctl, with no
L> >> printf(). Those, who need details on whether this is micro-burst or
L> >> persistent condition, can run monitoring software that draws plots.
L> >
L> >
L> > The single counter wouldn't tell you anything because it misses which
L> > socket/accept queue is affected by the overflow. The inpcb pointer
L> > can be cross-refrenced with netstat -a.
L> >
L> > Andriy for example would never have found out about this problem other
L> > than receiving vague user complaints about aborted connection attempts.
L> > Maybe after spending many hours searching for the cause he may have
L> > interfered from endless scrolling in Wireshark that something wasn't
L> > right and blame syncache first. Only later it would emerge that he's
L> > either receiving too many connections or his application is too slow
L> > dealing with incoming connections.
L> >
L> > If you can recommend a suitable and general sysadmin friendly monitoring
L> > software that will point out this problem I'm all ears.
L>
L> the problem with these non-throttled messages is that they often
L> cause thrashing -- you become slighly slow, messages start being
L> generated and your system becomes a lot slower, making it hard
L> to recover.
L>
L> What i usually do is throttle (in the kernel) and count the number of
L> message suppressed. Something like this (in a macro):
L>
L> static int ctr, last_tick;
L> if (ticks - last_tick > suppression_delay) {
L> printf("got this error ... (%d times)\n", ... , ctr);
L> ctr = 0;
L> last_tick = tick;
L> } else {
L> ctr++;
L> }
L>
L> the errors may not be exactly the same, the counter is race_prone
L> (you can make it atomic if you really feel like) but the whole point is
L> to get the idea that something is very wrong, not the exact count
L> or pointer
btw, there is ready function for that: ppsratecheck(), already utilized
for suppressing some error messages.
--
Totus tuus, Glebius.
More information about the freebsd-net
mailing list