[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

NTP Issues Today



On Nov 20, 2012, at 2:28 PM, Jay Ashworth <jra at baylink.com> wrote:

> ----- Original Message -----
>> From: "Leo Bicknell" <bicknell at ufp.org>
> 
>> To protect against two falseticking servers (tick and tock, as we saw on
>> the 19th) you need _FIVE_ servers minimum configured if they are both in
>> the list. More importantly, if you want to protect against a source
>> (GPS, CDMA, IRIG, WWIV, ACTS, etc) false ticking, you need a minimum of
>> _FOUR_ different source technologies in the list as well.
>> 
>> It's not hard, my box that I posted the logs from peers with 18
>> servers using 8 source technologies, all freely available on the Internet...
> 
> I'm curious, Leo, what your internal setup looks like.  Do you have an
> internal pair of masters, all slaved to those externals and one another, 
> with your machines homed to them?  Full mesh?  Or something else?
> 
> In my last big gig, it was recommended to me that I have all the machines 
> which had to speak to my DBMS NTP *to it*, and have only it connect to the
> rest of my NTP infrastructure.  It coming unstuck was of less operational
> impact than *pieces of it* going out of sync with one another...


here's a sample ntp config from one of my systems.

-- snip --
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server 0.fedora.pool.ntp.org
server 1.fedora.pool.ntp.org
server 2.fedora.pool.ntp.org
server 3.fedora.pool.ntp.org

#
server 0.us.pool.ntp.org iburst maxpoll 9
server 1.us.pool.ntp.org iburst maxpoll 9
server 2.us.pool.ntp.org iburst maxpoll 9
server 129.250.35.250 iburst maxpoll 9
server 129.250.35.251 iburst maxpoll 9

-- snip --

You can audit its operation like this:

nat:~$ ntpq -p -n -c ass
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
-129.250.35.250  164.244.221.197  2 u   68  512  377   19.248   -0.135   3.195
+129.250.35.251  192.5.41.40      2 u  439  512  377   41.817    1.109  15.660
-206.57.44.17    204.123.2.5      2 u  126  512  377   37.133   -6.443   9.631
+4.53.160.75     209.81.9.7       2 u   48  512  377   25.209    1.551   8.804
-64.73.32.135    192.5.41.41      2 u  349  512  377   23.418   -0.703   1.721
*50.116.38.157   64.250.177.145   2 u  380  512  377   43.021    1.267   2.136
+208.87.221.228  10.0.22.49       2 u  517  512  377   92.000    0.974   0.678
-206.212.242.132 128.252.19.1     2 u  323  512  377   21.781   -2.873   1.304
+38.229.71.1     204.123.2.72     2 u  211  512  377   21.977   -0.055   2.274

ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 39973  931a   yes   yes  none   outlyer    sys_peer  1
  2 39974  941a   yes   yes  none candidate    sys_peer  1
  3 39975  9324   yes   yes  none   outlyer   reachable  2
  4 39976  942a   yes   yes  none candidate    sys_peer  2
  5 39977  931a   yes   yes  none   outlyer    sys_peer  1
  6 39978  961a   yes   yes  none  sys.peer    sys_peer  1
  7 39979  9414   yes   yes  none candidate   reachable  1
  8 39980  931a   yes   yes  none   outlyer    sys_peer  1
  9 39981  941a   yes   yes  none candidate    sys_peer  1


What you would have seen is a falseticker from the impacted clocks.

This is a fairly reasonable setup.

I've also been looking at an item like this:

http://www.netburnerstore.com/ProductDetails.asp?ProductCode=PK70EX-NTP

which is about $300 + misc parts.

Should be well worth it to avoid a 'major outage' that some folks had with needing to reboot their servers, etc.

- Jared