ntp problems stratum 2 to 14?
Dewayne Geraghty
dewaynegeraghty at gmail.com
Thu Mar 5 20:50:06 UTC 2020
Ian, Good points. I did remove the fudge and 127.127.1.1 lines from the
config with the same result as below. Interestingly the clock at 10.0.7.6
isn't really unreliable, as its been my time source since 2005, and serves
clients, so it is pretty ok. Without a local clock, named fails (its linked
with kerberos). Yes I'd never seen my clock server become st 14, which
prompted me to seek help. I haven't repeated that scenario, but I
continuously get "interrupted system call" and I haven't been able to sync
while running debug or ktrace - so I dont have "what good looks like".
Thanks Peter. I've rebuilt net/ntpd in various ways including all
defaults. Rebuilt the kernel (& world), to the latest 12 stable.
Reset almost all sysctl's (I change 91 of them). I keep getting
interrupted system call at 1 sec intervals, which I suspect is a problem
For the reader: a stratum 2 clocks 10.0.7.6 syncs with some atomic clocks
within city; a server 10.0.7.91 running ntpd 4.2.8p14 on FreeBSD 12.1
r358565M irregularly and usually wont sync, and experiences "interrupted
system calls".
The jump to stratum 14 was a surprise, but not repeatable. Sometimes the
ntpd port starts and uses the next hop time server, but usually within 20
minutes returns to LOCL though more often goes straight to LOCL. During
ktrace's I've observed:
66894 ntpd CALL write(0x1,0x80078e000,0x2e)
66894 ntpd GIO fd 1 wrote 46 bytes
"select() returned -1: Interrupted system call
...
"poll_update: at 1 10.0.7.6 poll 4 burst 0 retry 2 head 14 early 2
next 16
"
66894 ntpd RET write 74/0x4a
66894 ntpd CALL select(0x19,0x7fffffffde50,0,0,0)
66894 ntpd RET select -1 errno 4 Interrupted system call
66894 ntpd PSIG SIGALRM caught handler=0x80072f600 mask=0x0
code=SI_KERNEL
66894 ntpd CALL sigprocmask(SIG_SETMASK,0x7fffffffd7a4,0)
66894 ntpd RET sigprocmask 0
66894 ntpd CALL sigreturn(0x7fffffffd3d0)
66894 ntpd RET sigreturn JUSTRETURN
66894 ntpd CALL write(0x1,0x80078e000,0x2e)
66894 ntpd GIO fd 1 wrote 46 bytes
"select() returned -1: Interrupted system call
...
"select() returned -1: Interrupted system call
"
66894 ntpd RET write 46/0x2e
66894 ntpd CALL select(0x19,0x7fffffffde50,0,0,0)
66894 ntpd RET select -1 errno 4 Interrupted system call
66894 ntpd PSIG SIGALRM caught handler=0x80072f600 mask=0x0
code=SI_KERNEL
66894 ntpd CALL sigprocmask(SIG_SETMASK,0x7fffffffd7a4,0)
66894 ntpd RET sigprocmask 0
66894 ntpd CALL sigreturn(0x7fffffffd3d0)
66894 ntpd RET sigreturn JUSTRETURN
66894 ntpd CALL write(0x1,0x80078e000,0x2e)
66894 ntpd GIO fd 1 wrote 46 bytes
"select() returned -1: Interrupted system call
but I have no idea whether or not these interrupted system calls are normal
or not.
and with debug (-D5) this is what I see
5 Mar 18:26:50 ntpd[86274]: select(): nfound=-1, error: Interrupted system
call
alarming: normal
5 Mar 18:26:51 ntpd[86274]: select(): nfound=-1, error: Interrupted system
call
poll_update: at 17 10.0.7.6 poll 4 burst 0 retry 0 head 0 early 2 next 16
sendpkt(21, dst=10.0.7.6, src=10.0.7.91, ttl=0, len=72)
peer_xmit: at 17 10.0.7.91->10.0.7.6 mode 3 keyid 0000232f len 72 index 0
read_network_packet: fd=21 length 72 from 10.0.7.6
fetch_timestamp: system bintime network time stamp: 1583393211.408612992
restrictions: looking up 10.0.7.6
match_restrict4_addr: Checking 127.0.0.1, port 123 ... doesn't match:
ippeerlimit -4
match_restrict4_addr: Checking 127.0.0.1, port 123 ... doesn't match:
ippeerlimit -1
match_restrict4_addr: Checking 10.0.7.91, port 123 ... doesn't match:
ippeerlimit -4
match_restrict4_addr: Checking 10.0.7.6, port 123 ... MATCH: ippeerlimit -1
receive: at 17 10.0.7.91<-10.0.7.6 ippeerlimit -1 mode 4 iflags
up,broadcast,multicast restrict nomodify,nopeer,noquery,notrap org
0xe20b283b.687ad7f6 xmt 0xe20b2838.2001aa5b
MRU: interval 16 headway 8 limit 64
receive: at 17 10.0.7.91<-10.0.7.6 mode 4/server:AM_PROCPKT keyid 0000232f
len 72 auth 1 org 0xe20b283b.687ad7f6 xmt 0xe20b2838.2001aa5b MAC
receive: MATCH_ASSOC dispatch: mode 4/server:AM_PROCPKT
poll_update: at 17 10.0.7.6 poll 4 burst 0 retry 0 head 14 early 2 next 16
clock_filter: n 2 off -3.283397 del 0.000382 dsp 3.937561 jit 0.000551
alarming: normal
5 Mar 18:26:52 ntpd[86274]: select(): nfound=-1, error: Interrupted system
call
alarming: normal
5 Mar 18:26:53 ntpd[86274]: select(): nfound=-1, error: Interrupted system
call
alarming: normal
I have rebuild with ntp-4.2.8p14 and with no additional compile rules and
the port's defaults. I keep getting
6 Mar 07:05:16 ntpd[98682]: select(): nfound=-1, error: Interrupted system
call
alarming: normal
6 Mar 07:05:17 ntpd[98682]: select(): nfound=-1, error: Interrupted system
call
alarming: normal
6 Mar 07:05:18 ntpd[98682]: select(): nfound=-1, error: Interrupted system
call
alarming: normal
6 Mar 07:05:19 ntpd[98682]: select(): nfound=-1, error: Interrupted system
call
alarming: normal
6 Mar 07:05:20 ntpd[98682]: select(): nfound=-1, error: Interrupted system
call
So with compile, minimum settings and all aslr variables set to 0. We get
the above result running as either ntpd or root. I'm starting to wonder
about the network stack?
My build comprises:
# make __MAKE_CONF=/etc/make_test.conf -C /usr/ports/net/ntp -VCFLAGS
-VLDFLAGS
-O2 -pipe -fstack-protector-strong -fno-strict-aliasing
-fstack-protector
# make __MAKE_CONF=/etc/make_test.conf -C /usr/ports/net/ntp showconfig
|grep =on
DEBUG=on: Build with debugging support
DOCS=on: Build and/or install documentation
EXAMPLES=on: Build and/or install examples
IPV6=on: IPv6 protocol support
LOCAL_CLOCK=on: Enable local clock reference
NLS=on: Native Language Support
PERL_UTILS=on: Install ntp-wait and ntptrace utilities
SSL=on: SSL protocol support
THREADS=on: Threading support
Also the base systems ntp 4.2.8p13 also experiences "interrupted system
calls"
80370 ntpd CALL select(0x19,0x7fffffffde40,0,0,0)
80370 ntpd RET select -1 errno 4 Interrupted system call
80370 ntpd PSIG SIGALRM caught handler=0x801473610 mask=0x0
code=SI_KERNEL
80370 ntpd CALL sigprocmask(SIG_SETMASK,0x7fffffffd7a4,0)
80370 ntpd RET sigprocmask 0
80370 ntpd CALL sigreturn(0x7fffffffd3d0)
80370 ntpd RET sigreturn JUSTRETURN
80370 ntpd CALL select(0x19,0x7fffffffde40,0,0,0)
80370 ntpd RET select -1 errno 4 Interrupted system call
; perhaps just having aslr built into the kernel?? Its straw clutching
time...
I appreciate your consideration. All suggestions welcome. :)
PS Ian the main time server 10.0.7.6 sync's off 2x atomic clocks, with 2x
GPS as failover ;) The internal server 10.0.7.91 serves the internal
clients only.
More information about the freebsd-stable
mailing list