[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[no subject]
Also? from that Fake Host, we could run TCPDUMP, which captured traffic
flowing across that Ethernet and produced reams of output with a melange
of multi-protocol packet headers.? Again, all of that could make its way
into the database on demand, organized into useful Tables, delayed if
necessary to avoid impacting the network misbehavior we were trying to
debug.?? Give a Unix guru awk, sed, cron and friends and amazing things
can happen.
We could even run a Flakeway on that Anchor Host, to simulate network
glitches for experimentation, but I can't recall ever having to do
that.? But perhaps the ops did and I never knew.
Once all that stuff got into the database, it became data.?? Not a
problem.? I was a network guy afloat in an ocean of database gurus, and
I was astonished at the way they could manipulate that data and turn it
into Information.
I didn't get involved much in everyday network operations, but when
weird things happened I'd stick my nose in.?
Once there was an anomaly in a trans-pacific path, where there was a
flaky circuit that would go down and up annoyingly often.? The carrier
was "working on it..."?
What the ops had noticed was that after such a glitch finished, the
network would settle down as expected.? But sometimes, the RTT delay and
bandwidth measurements would settle down to a new stable level
noticeably different from before the line glitch.?? They even had
brought up a rolling real-time graph of the data, kind of like a
hospital heart-monitor, that clearly showed the glitch and the change in
behavior.
Using our adhoc tools, we traced the problem down to a bug in some
vendor's Unix system.? That machine's TCP retransmission timer algorithm
was reacting to the glitch, and adapting as the rerouting occurred.? But
after the glitch, the TCP had settled into a new stable pattern where
the retransmission timer fired just a little too soon, and every packet
was getting sent twice. ? The network anomaly would show up if a line
glitch occurred, but only if that Unix user was in the middle of doing
something like a file transfer across the Pacific at the time.?? The
Hosts and TCPs were both happy, the Routers were blissfully ignorant,
and half that expensive trans-pacific circuit was being wasted carrying
duplicate packets.
With the data all sitting in the database, we had the tools to figure
that out.?? We reported the TCP bug to the Unix vendor.? I've always
wondered if it ever got fixed, since most customers would probably never
notice.
Another weird thing was that "my quarterly report won't go" scenario.?
That turned out to be a consequence of the popularity of the "Global
Lan" idea in the network industry at the time.? IIRC, someone in some
office in Europe had just finished putting together something like a
library of graphics and photos for brochures et al, and decided to send
it over to the colleagues who were waiting for it.?? Everybody was on
the department "LAN", so all you had to do was drag this folder over
there to those guys' icons and it would magically appear on their
desktops.? Of course it didn't matter that those other servers were in
the US, Australia, and Asia - it's a Global LAN, right!
The network groaned, but all the routers and lines stayed up, happily
conveying many packets per second.? For hours.? Unfortunately too few of
the packets were carrying that email traffic.
We turned off "Global LAN" protocols in the routers ... but of course
today such LAN-type services all run over TCP, so it might not be quite
as easy.
The other important but less urgent Network Management activity involved
things like Capacity Planning.? With the data in the database, it was
pretty easy to get reports or graphs of trends over a month/quarter, and
see the need to order more circuits or equipment.
We could also run various tests like traffic generators and such and
gather data when there were no problems in the network.?? That collected
data provided a "baseline" of how things looked when everything was
working.? During problem times, it was straightforward to run similar
tests and compare the results with the baselines to figure out where the
source of a problem might be by highlighting significant differences.??
The ability to compare "working" and "broken" data is a powerful Network
Management tool.
So that'w what we did.? I'm not sure I'd characterize all that kind of
activity as either Configuration or Monitoring.? I've always thought it
was just Network Management.
There's a lot of History of the Internet protocols, equipment, software,
etc., but I haven't seen much of a historical account of how the various
pieces of the Internet have been operated and managed, and how the tools
and techniques have evolved over time.
If anybody's up for it, it would be interesting to see how other people
did such "Network Management" activities with their own adhoc tools as
the Internet evolved.
It would also be fascinating to see how today's expensive Network
Management Systems tools would be useful in my scenarios above.?? I.e.,
how effective would today's tools be if? used by network operators to
deal with my example network management scenarios - along the lines of
RFC1109's observations about how to evaluate Network Management technology.
BTW, everything I wrote above occurred in 1990-1991.
/Jack
--
Internet-history mailing list
Internet-history at elists.isoc.org
https://elists.isoc.org/mailman/listinfo/internet-history