sleep(3) sometimes too sleepy on FreeBSD 8.0?
Kostik Belousov
kostikbel at gmail.com
Tue Feb 23 09:36:23 UTC 2010
On Tue, Feb 23, 2010 at 12:35:22PM +1100, John Marshall wrote:
> Environment: sendmail 8.14.4 on FreeBSD 8.0-RELEASE-p2
>
> Since upgrading a few local servers to FreeBSD 8.0-RELEASE (and
> subsequently 8.0-RELEASE-p2), I have been seeing VERY intermittent
> problems with sendmail persistent queue runners. One or more queue
> runners will fail to wake up (having been told to sleep for either 1 or
> 5 seconds) and mail accumulates in their queue group queues.
>
> I have only seen this about 4 times but at least once on each of the
> three 8.0 servers. I've been seeing something like one occurrence per
> fortnight overall. The first few times I re-started sendmail. On
> Saturday I spent longer looking at it.
>
> - attached to each of the stuck queue runner processes via gdb to
> try to see where they were stuck
> - backtraces from both process were identical and looked sane
> - attached to a happy queue runner process and got an identical
> backtrace
> - exited gdb and discovered that the stuck queue runners had woken
> up and flushed their queues!
>
> The stuck queue runner processes had been stuck for several hours
> (judging by the timestamps on the queued mail messages) but the gdb
> attach apparently woke them up!
>
> PROCESS STATES BEFORE DEBUG (stuck runners are in 'I' state)
>
> PID TT STAT TIME COMMAND
> 80298 ?? Ss 0:17.68 sendmail: accepting connections (sendmail)
> 80299 ?? I 0:46.62 sendmail: running queue: /var/spool/mqueue/qd1/df (sendmail)
> 80300 ?? I 0:08.83 sendmail: running queue: /var/spool/mqueue/mby/df (sendmail)
> 80301 ?? S 0:31.58 sendmail: running queue: /var/spool/mqueue/oz/df (sendmail)
> 80302 ?? S 0:30.71 sendmail: running queue: /var/spool/mqueue/rw2/df (sendmail)
> 80303 ?? S 0:33.29 sendmail: running queue: /var/spool/mqueue/hold/df (sendmail)
> 80304 ?? S 0:30.55 sendmail: running queue: /var/spool/mqueue/pgp/df (sendmail)
>
> BACKTRACE OF STUCK PROCESS 80299
>
> (gdb) bt
> #0 0x28346547 in sigsuspend () from /lib/libc.so.7
> #1 0x28344e98 in sigpause () from /lib/libc.so.7
> #2 0x2833be3e in pause () from /lib/libc.so.7
> #3 0x080cc7c8 in sleep ()
> #4 0x08099c51 in run_work_group ()
> #5 0x08099ebf in runqueue ()
> #6 0x0805538d in main ()
>
> BACKTRACE OF HAPPY PROCESS 80301
>
> (gdb) bt
> #0 0x28346547 in sigsuspend () from /lib/libc.so.7
> #1 0x28344e98 in sigpause () from /lib/libc.so.7
> #2 0x2833be3e in pause () from /lib/libc.so.7
> #3 0x080cc7c8 in sleep ()
> #4 0x08099c51 in run_work_group ()
> #5 0x08099ebf in runqueue ()
> #6 0x0805538d in main ()
>
> PROCESS STATES AFTER DEBUG
>
> PID TT STAT TIME COMMAND
> 80298 ?? Ss 0:17.69 sendmail: accepting connections (sendmail)
> 80299 ?? S 0:46.66 sendmail: running queue: /var/spool/mqueue/qd1/df (sendmail)
> 80300 ?? S 0:08.85 sendmail: running queue: /var/spool/mqueue/mby/df (sendmail)
> 80301 ?? S 0:31.60 sendmail: running queue: /var/spool/mqueue/oz/df (sendmail)
> 80302 ?? S 0:30.73 sendmail: running queue: /var/spool/mqueue/rw2/df (sendmail)
> 80303 ?? S 0:33.32 sendmail: running queue: /var/spool/mqueue/hold/df (sendmail)
> 80304 ?? S 0:30.58 sendmail: running queue: /var/spool/mqueue/pgp/df (sendmail)
>
> SENDMAIL DETAILS
>
> Version 8.14.4
> Compiled with: DNSMAP LOG MAP_REGEX MATCHGECOS MILTER MIME7TO8 MIME8TO7
> NAMED_BIND NETINET NETUNIX NEWDB NIS PIPELINING SASLv2 SCANF
> STARTTLS USERDB XDEBUG
>
> /usr/sbin/sendmail:
> libsasl2.so.2 => /usr/local/lib/libsasl2.so.2 (0x28154000)
> libssl.so.7 => /usr/local/lib/libssl.so.7 (0x2816a000)
> libcrypto.so.7 => /usr/local/lib/libcrypto.so.7 (0x281ad000)
> libutil.so.8 => /lib/libutil.so.8 (0x282f2000)
> libc.so.7 => /lib/libc.so.7 (0x28300000)
> libz.so.5 => /lib/libz.so.5 (0x2840c000)
>
> I posted about this in comp.mail.sendmail and was told...
>
> > sleep() should be one of these calls:
> >
> > if (njobs == 0 && WorkGrp[wgrp].wg_lowqintvl < MIN_SLEEP_TIME)
> > sleep(MIN_SLEEP_TIME);
> > else if (WorkGrp[wgrp].wg_lowqintvl <= 0)
> > sleep(QueueIntvl > 0 ? QueueIntvl : MIN_SLEEP_TIME);
> > else
> > sleep(WorkGrp[wgrp].wg_lowqintvl);
> >
> > Unless you have a really large value for one of these, the process
> > should continue after a while.
>
> The above code snippet is from sendmail/queue.c which fixes
> MIN_SLEEP_TIME at 5. QueueIntvl defaults to 1. wg_lowqintvl defaults
> to 0. I have not set any configuration or runtime options to override
> these defaults, so my persistent queue runners should be sleeping for
> either 1s or 5s only (not hours!).
I think the best way to collect the data would be ktrace the queue runners,
preferrably starting the ktrace before they are stuck.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20100223/39c1f017/attachment.pgp
More information about the freebsd-stable
mailing list