[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Cloudflare is down
On Mon, Mar 4, 2013 at 10:40 AM, Saku Ytti <saku at ytti.fi> wrote:
> On (2013-03-04 13:23 -0500), Jeff Wheeler wrote:
>> We have lots of stupid people in our industry because so few
>> understand "The Way Things Work."
> We have tendency to view mistakes we do as unavoidable human errors and
> mistakes other people do as avoidable stupidity.
> We should actively plan for mistakes/errors, if you actively plan for no
> 'stupid mistakes', you're gonna have bad time
> From my point of view, outages are caused by:
> 1) operator
> 2) software defect
> 3) hardware defect
> Most people design only against 3), often with design which actually
> increases likelihood of 2) and 1), reducing overall MTBF on design which
> strictly theoretically increases it.
...And a lot of people who know the heirarchy solve 3 and then solve 2
in a way that increases 1 (multiple parallel environments with
different vendors' equipment) only to find that 1 increased, due to
On the other hand, I've seen people who had horrible explosions of 2
or 3 due to ignoring all but 1.
If you ACTUALLY need that many 9s, you need all of redundancy,
diversity of vendors, and suitably trained, exercised,
process-supported net admins. That's a few multiples of 2 more
expense than nearly anyone typically wants to pay for.
-george william herbert
george.herbert at gmail.com