[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[outages] NTP Issues Today



Logs from a Juniper router in a customer network - we had hundreds of these affected.  They all synchronize to internal hosts (172.20.167.251 and .252) which are configured to get time from  NIST and USNO  

CORP-NTP-01#sh ntp as

      address         ref clock     st  when  poll reach  delay  offset    disp
*~192.5.41.41      .IRIG.            1   354   512  377    34.2    0.36     1.4
+~132.163.4.101    .ACTS.            1   336   512  377    35.0   -2.54    18.7
 ~127.127.7.1      127.127.7.1      10    59    64  377     0.0    0.00     0.0
 * master (synced), # master (unsynced), + selected, - candidate, ~ configured

CORP-NTP-02#sh ntp as

      address         ref clock     st  when  poll reach  delay  offset    disp
*~192.5.41.41      .IRIG.            1    65   512  377    36.5    0.91     0.6
+~132.163.4.101    .ACTS.            1    95   512  377    34.3   -1.31    22.8
 ~127.127.7.1      127.127.7.1      10    44    64  377     0.0    0.00     0.0
 * master (synced), # master (unsynced), + selected, - candidate, ~ configured

Here are the logs from one of the Junipers:

Nov 19 14:24:48  XXXX xntpd[912]: kernel time sync enabled 2001
Nov 19 15:50:11  XXXX xntpd[912]: synchronized to 172.20.167.252, stratum=2
Nov 19 16:41:23  XXXX xntpd[912]: no servers reachable
Nov 19 16:44:24  XXXX xntpd[912]: synchronized to 172.20.167.251, stratum=2
Nov 19 16:44:24  XXXX xntpd[912]: time correction of -378691200 seconds exceeds sanity limit (1000); set clock manually to the correct UTC time.
Nov 19 16:44:24  XXXX init: ntp (PID 912) exited with status=255
Nov 19 16:44:24  XXXX init: ntp (PID 70200) started
Nov 19 16:44:24  XXXX xntpd[70200]: ntpd 4.2.0-a Sat Apr 10 00:32:46 UTC 2010 (1)
Nov 19 16:44:24  XXXX xntpd[70200]: mlockall(): Resource temporarily unavailable
Nov 19 16:44:24  XXXX xntpd[70200]: precision = 0.582 usec
Nov 19 16:44:24  XXXX xntpd[70200]: Listening on interface ggsn_vpn, 128.0.0.1#123
Nov 19 16:44:24  XXXX xntpd[70200]: kernel time sync status 2040
Nov 19 16:44:24  XXXX xntpd[70200]: frequency initialized -64.931 PPM from /var/db/ntp.drift
Nov 19 16:44:24  XXXX xntpd[70200]: Configuring iburst flag for server
Nov 19 16:44:24  XXXX xntpd[70200]: Configuring iburst flag for server
Nov 19 16:44:33  XXXX xntpd[70200]: synchronized to 172.20.167.251, stratum=2
Nov 19 16:44:32  XXXX xntpd[70200]: time reset -378691200.411331 s
Nov 19 16:44:32  XXXX xntpd[70200]: kernel time sync disabled 2041
Nov 19 16:45:44  XXXX xntpd[70200]: synchronized to 172.20.167.251, stratum=2
Nov 19 16:45:51  XXXX xntpd[70200]: kernel time sync enabled 2001
Nov 19 16:45:56  XXXX xntpd[70200]: NTP Server Unreachable
Nov 19 16:53:25  XXXX xntpd[70200]: no servers reachable
Nov 19 17:03:09  XXXX xntpd[70200]: NTP Server Unreachable
Nov 19 17:13:00  XXXX xntpd[70200]: NTP Server Unreachable
Nov 19 17:20:27  XXXX xntpd[70200]: synchronized to 172.20.167.252, stratum=2
Nov 19 17:20:27  XXXX xntpd[70200]: time correction of 378691200 seconds exceeds sanity limit (1000); set clock manually to the correct UTC time.
Nov 19 17:20:27  XXXX init: ntp (PID 70200) exited with status=255
Nov 19 17:20:27  XXXX init: ntp (PID 70766) started
Nov 19 17:20:27  XXXX xntpd[70766]: ntpd 4.2.0-a Sat Apr 10 00:32:46 UTC 2010 (1)
Nov 19 17:20:27  XXXX xntpd[70766]: mlockall(): Resource temporarily unavailable
Nov 19 17:20:27  XXXX xntpd[70766]: precision = 0.570 usec
Nov 19 17:20:27  XXXX xntpd[70766]: Listening on interface ggsn_vpn, 128.0.0.1#123
Nov 19 17:20:27  XXXX xntpd[70766]: kernel time sync status 2040
Nov 19 17:20:27  XXXX xntpd[70766]: frequency initialized -64.931 PPM from /var/db/ntp.drift
Nov 19 17:20:27  XXXX xntpd[70766]: Configuring iburst flag for server
Nov 19 17:20:27  XXXX xntpd[70766]: Configuring iburst flag for server
Nov 19 17:20:35  XXXX xntpd[70766]: synchronized to 172.20.167.252, stratum=2
Nov 19 17:20:36  XXXX xntpd[70766]: time reset +378691200.387434 s
Nov 19 17:20:36  XXXX xntpd[70766]: kernel time sync disabled 6041
Nov 19 17:21:48  XXXX xntpd[70766]: synchronized to 172.20.167.252, stratum=2
Nov 19 17:21:48  XXXX xntpd[70766]: kernel time sync disabled 2041
Nov 19 17:21:52  XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 00:02:29  XXXX xntpd[70766]: synchronized to 172.20.167.251, stratum=2
Nov 20 01:44:56  XXXX xntpd[70766]: kernel time sync enabled 6001
Nov 20 02:19:03  XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 02:53:12  XXXX xntpd[70766]: kernel time sync enabled 6001
Nov 20 03:44:26  XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 05:26:58  XXXX xntpd[70766]: kernel time sync enabled 6001
Nov 20 05:44:02  XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 07:43:35  XXXX xntpd[70766]: kernel time sync enabled 6001
Nov 20 08:00:39  XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 08:34:48  XXXX xntpd[70766]: kernel time sync enabled 6001
Nov 20 08:51:54  XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 10:34:22  XXXX xntpd[70766]: synchronized to 172.20.167.252, stratum=2
Nov 20 11:25:16  XXXX xntpd[70766]: synchronized to 172.20.167.251, stratum=2
Nov 20 12:33:56  XXXX xntpd[70766]: synchronized to 172.20.167.252, stratum=2
Nov 20 14:16:05  XXXX xntpd[70766]: kernel time sync enabled 6001
Nov 20 14:33:10  XXXX xntpd[70766]: kernel time sync enabled 2001
Nov 20 15:07:19  XXXX xntpd[70766]: synchronized to 172.20.167.251, stratum=2




-----Original Message-----
From: outages-bounces at outages.org [mailto:outages-bounces at outages.org] On Behalf Of Jeremy Chadwick
Sent: Tuesday, November 20, 2012 10:38 AM
To: Scott Voll
Cc: Sid Rao; outages; nanog at nanog.org
Subject: Re: [outages] NTP Issues Today

I'm still waiting for someone who was affected by this to provide coherent logs from ntpd showing exactly when the time change happened.
Getting these, at least on an *IX system, is far from difficult folks.

Please don't omit anything from the logs either; for example if you know
*exactly* what NTP servers were in use (not "ones you had configured"
but which one was primarily chosen by ntpd ('*' mark) and which were secondary comparisons/fallbacks ('+' mark)), that would also be greatly helpful.  This would be output from "ntpq -c peers" when run on your NTP server *at or around the time* the incident happened and recovered.

What's been provided so far is that "something happened", with reports of clocks going back to year 2000, and other reports of clocks going back to (presumably) epoch time; those reporting it were using either usno.navy.mil, NIST, or Microsoft NTP servers.  usno.navy.mil uses dedicated IRIG/AFNOR TCRs boxes, while NIST uses GPS.  No idea what Microsoft uses.

I asked on a public *IX forum if anyone saw anything NTP-wise that was out of the ordinary and not a single admin saw anything.  I also saw nothing anomalous on either of my FreeBSD machines (9.1-PRERELEASE, running base system ntpd 4.2.4p8), but I sync with very specific stratum
1 and stratum 2 servers across the United States.

As Mark Andrews from the ISC stated below (read slowly/carefully), ntpd will not allow large clock jumps -- the largest it'll allow out of the box is 1000s (and on some systems like Solaris ntpd, 500s) -- unless you're running with the -g flag (and shame on if you're you doing that).
So I'm very surprised by this problem altogether.  Can't deny what happened did, but figuring out *why* is important.

Also, for Mike Lyon -- I looked at NIST's GPS graphs.  Did you notice they have no data for 11/18, 11/19, or 11/20?  I find that unnerving, do you not?

-- 
| Jeremy Chadwick                                   jdc at koitsu.org |
| UNIX Systems Administrator                http://jdc.koitsu.org/ |
| Mountain View, CA, US                                            |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

On Tue, Nov 20, 2012 at 07:18:45AM -0800, Scott Voll wrote:
> Same thing happened to us yesterday.  ended up having to reboot 
> everything after we got time fixed.  Major outage.
> 
> Scott
> 
> 
> On Mon, Nov 19, 2012 at 7:58 PM, Sid Rao <srao at ctigroup.com> wrote:
> 
> > We had multiple servers synchronized with Windows/MS time change 
> > their clock to the year 2000 today.  It broke many things, including 
> > AD authentication.
> >
> > These servers had been properly synchronized for years.
> >
> > They were synchronized with Microsoft and NIST NTP servers.
> >
> > This may not be isolated.
> >
> > Sid Rao | CTI Group | +1 (317) 262-4677
> >
> > On Nov 19, 2012, at 10:29 PM, "George Herbert" 
> > <george.herbert at gmail.com>
> > wrote:
> >
> > > crossreplying to outages list.
> > >
> > > Is anyone ELSE seeing GPS issues?  This could well have been an 
> > > unrelated issue on that particular PBX.
> > >
> > > If this was real, then the mother of all infrastructure attacks 
> > > might be underway...
> > >
> > > One glitch on tick and tock and one malfunctioning PBX is not 
> > > sufficient evidence of pattern - much less hostile activity - to 
> > > induce panic, but it would perhaps be a wise time to check 
> > > time-related logs?
> > >
> > >
> > > -george
> > >
> > > On Mon, Nov 19, 2012 at 6:08 PM, Wallace Keith 
> > > <kwallace at pcconnection.com> wrote:
> > >> Just got paged with a pbx alarm that had 1970 as the year. By the 
> > >> time
> > I logged in , it was showing 2012.  Using GPS for time and date.
> > >>
> > >> -----Original Message-----
> > >> From: Mark Andrews [mailto:marka at isc.org]
> > >> Sent: Monday, November 19, 2012 8:42 PM
> > >> To: Van Wolfe
> > >> Cc: nanog at nanog.org
> > >> Subject: Re: NTP Issues Today
> > >>
> > >>
> > >> In message <
> > CAMeggd4cDQwhxQE_JbvpNR-PKKe9LXqA+KzJ97anHFonjwZhdQ at mail.gmail.com>
> > >> , Van Wolfe writes:
> > >>> Hello,
> > >>>
> > >>> Did anyone else experience issues with NTP today?  We had our 
> > >>> server times update to the year 2000 at around 3:30 MT, then 
> > >>> revert back to
> > 2012.
> > >>>
> > >>> Thanks,
> > >>> Van
> > >>
> > >> NTP should be immune from this sort of behaviour unless you did a
> > ntpdate at the wrong moment.  The clocks should have been marked as insane.
> > >>
> > >> Mark
> > >> --
> > >> Mark Andrews, ISC
> > >> 1 Seymour St., Dundas Valley, NSW 2117, Australia
> > >> PHONE: +61 2 9871 4742                 INTERNET: marka at isc.org
> > >>
> > >>
> > >
> > >
> > >
> > > --
> > > -george william herbert
> > > george.herbert at gmail.com
> > >
> > >
> >
> >
> > _______________________________________________
> > Outages mailing list
> > Outages at outages.org
> > https://puck.nether.net/mailman/listinfo/outages
> >

> _______________________________________________
> Outages mailing list
> Outages at outages.org
> https://puck.nether.net/mailman/listinfo/outages

_______________________________________________
Outages mailing list
Outages at outages.org
https://puck.nether.net/mailman/listinfo/outages