[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
CenturyLink RCA?
- Subject: CenturyLink RCA?
- From: ximaera at gmail.com (Töma Gavrichenkov)
- Date: Sun, 30 Dec 2018 20:46:05 +0300
- In-reply-to: <CAAeewD_AwvQZsETRxyWfgtbQKi11S=+jEa70Prykub0e3wFkrw@mail.gmail.com>
- References: <CAAeewD_+sd1FnKXb58MkF8yt6XFJwJhR+zfZhQGeCFPGr7XKuw@mail.gmail.com> <543548599.1639.1546184537220.JavaMail.mhammett@ThunderFuck> <[email protected]> <CAAeewD_AwvQZsETRxyWfgtbQKi11S=+jEa70Prykub0e3wFkrw@mail.gmail.com>
There's a Reddit user claiming he works at CL who says the reason were some
faulty Infinera DTN-X instances.
https://www.reddit.com/r/centurylink/comments/aa2qa4/comment/ecovgab
(dunno though why the user posted that to Reddit and not here)
30 Dec. 2018 г., 20:19 Saku Ytti <saku at ytti.fi>:
> Hey John,
>
> Your criticism is warranted, but would also be addressed by
> explanation DCN/OOB being the source of the problem.
>
> At any rate, I am looking forward to stop speculating and start
> reading post-mortem written by someone who knows how networks work.
>
> On Sun, 30 Dec 2018 at 18:28, John Von Essen <john at essenz.com> wrote:
> >
> > One thing that is troubling when reading that URL is that it appears
> several steps of restoration required teams to go onsite for local login,
> etc.,. Granted, to troubleshoot hardware you need to be physically present
> to pop a line card in and out, but CTL/LVL3 should have full out-of-band
> console and power control to all core devices, we shouldn't be waiting for
> someone to drive to a location to get console or do power cycling. And I
> would imagine the first step to alot of the troubleshooting was power
> cycling and local console logs.
> >
> >
> > -John
> >
> >
> >
> > On 12/30/18 10:42 AM, Mike Hammett wrote:
> >
> > It's technical enough so that laypeople immediately lose interest, yet
> completely useless to anyone that works with this stuff.
> >
> >
> >
> > -----
> > Mike Hammett
> > Intelligent Computing Solutions
> > http://www.ics-il.com
> >
> > Midwest-IX
> > http://www.midwest-ix.com
> >
> > ________________________________
> > From: "Saku Ytti" <saku at ytti.fi>
> > To: "nanog list" <nanog at nanog.org>
> > Sent: Sunday, December 30, 2018 7:42:49 AM
> > Subject: CenturyLink RCA?
> >
> > Apologies for the URL, I do not know official source and I do not
> > share the URLs sentiment.
> > https://fuckingcenturylink.com/
> >
> > Can someone translate this to IP engineer? What did actually happen?
> > From my own history, I rarely recognise the problem I fixed from
> > reading the public RCA. I hope CenturyLink will do better.
> >
> > Best guess so far that I've heard is
> >
> > a) CenturyLink runs global L2 DCN/OOB
> > b) there was HW fault which caused L2 loop (perhaps HW dropped BPDU,
> > I've had this failure mode)
> > c) DCN had direct access to control-plane, and L2 congested
> > control-plane resources causing it to deprovision waves
> >
> > Now of course this is entirely speculation, but intended to show what
> > type of explanation is acceptable and can be used to fix things.
> > Hopefully CenturyLink does come out with IP-engineering readable
> > explanation, so that we may use it as leverage to support work in our
> > own domains to remove such risks.
> >
> > a) do not run L2 DCN/OOB
> > b) do not connect MGMT ETH (it is unprotected access to control-plane,
> > it cannot be protected by CoPP/lo0 filter/LPTS ec)
> > c) do add in your RFP scoring item for proper OOB port (Like Cisco CMP)
> > d) do fail optical network up
> >
> > --
> > ++ytti
> >
>
>
> --
> ++ytti
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20181230/1013e57e/attachment.html>