[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Service provider story about tracking down TCP RSTs

On 9/1/18, William Herrin <bill at herrin.us> wrote:
> On Sat, Sep 1, 2018 at 6:11 PM, Lee <ler762 at gmail.com> wrote:
>> On 9/1/18, William Herrin <bill at herrin.us> wrote:
>>> On Sat, Sep 1, 2018 at 4:00 PM, William Herrin <bill at herrin.us> wrote:
>>>> Better yet, do the job right and build an anycast TCP stack as
>>>> described here: https://bill.herrin.us/network/anycasttcp.html
>> An explosion in state management would be the least of my worries :)
>> I got as far as your Third hook: and thought of this
>>   https://www.jwz.org/doc/worse-is-better.html
> Hi Lee,
> On a brief tangent: Geographic routing would drastically simplify the
> Internet core, reducing both cost and complexity. You'd need to carry
> only nearby specific routes and a few broad aggregates for
> destinations far away. It will never be implemented, never, because no
> cross-ocean carriers are willing to have their bandwidth stolen when
> the algorithm decides it likes their path better than a paid one. Even
> though the algorithm gets the packets where they're going, and does so
> simply, it does so in a way that's too often incorrect.
> Then again, I don't really understand the MIT/New Jersey argument in
> Richard's worse-is-better story.

The "New Jersey" description is more of a caricature than a valid description:
  "I have intentionally caricatured the worse-is-better philosophy to
   convince you that it is obviously a bad philosophy and that the
   New Jersey approach is a bad approach."

I mentally did a 's/New Jersey/Microsoft/' and it made a lot more sense.

> The MIT guy says that a routine
> should handle a common non-fatal exception. The Jersey guy says that
> it's ok for the routine to return a try-again error and expect the
> caller to handle it. Since its trivial to build another layer that
> calls the routine in a loop until it returns success or a fatal error,
> it's more a philosophical argument than a practical one. As long as a
> correct result is consistently achieved in both cases, what's the
> difference?

That it's not always a trivial matter to build another layer.
That your retry layer needs at least a counter or timeout value so it
doesn't retry forever & those values need to be user configurable, so
the re-try layer isn't quite as trivial as it appears at first blush.

> Richard characterized the Jersey argument as, "It is slightly better
> to be simple than correct." I just don't see that in the Jersey
> argument. Every component must be correct. The system of components as
> a whole must be complete. It's slightly better for a component to be
> simple than complete. That's the argument I read and it makes sense to
> me.

Yes, I did a lot of interpreting also.  Then I hit on s/New
Jersey/Microsoft/ and it made a lot more sense to me.

> Honestly, the idea that software is good enough even with known corner
> cases that do something incorrect... I don't know how that survives in
> a world where security-conscious programming is not optional.

Agreed.  I substituted "soft-fail or fail-closed: user has to retry"
for doing something incorrect.

>> I had it much easier with anycast in an enterprise setting.  With
>> anycast servers in data centers A & B, just make sure no site has an
>> equal cost path to A and B.  Any link/ router/ whatever failure & the
>> user can just re-try.
> You've delicately balanced your network to achieve the principle that
> even when routing around failures the anycast sites are not
> equidistant from any other site. That isn't simplicity. It's
> complexity hidden in the expert selection of magic numbers.

^shrug^ it seemed simple to me.  And it was real easy to explain,
which is why I thought of that "worse is better" paper.  I took the
New Jersey approach & did what was basically a hack. You took the MIT
approach and created a general solution .. which is not so easy to
explain :)

> Even were that achievable in a network as chaotic as the Internet, is it simpler
> than four trivial tweaks to the TCP stack plus a modestly complex but
> fully automatic user-space program that correctly reroutes the small
> percentage of packets that went astray?

Your four trivial tweaks to the TCP stack are kernel patches - right?
Which seems not at all trivial to me, but if you've got a group of
people that can support & maintain that - good for you!