[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Europe-to-US congestion and packet loss on he.net network, and their NOC@ won't even respond



Constantine,

Please mail me offlist if Init7 can be of any help to resolve the case.

--
Fredy Kuenzler
Init7 (Switzerland) Ltd.
St.-Georgen-Strasse 70
CH-8400 Winterthur
Switzerland

http://www.init7.net/


> Am 30.11.2013 um 08:30 schrieb "Constantine A. Murenin" <mureninc at gmail.com>:
> 
> Dear NANOG@,
> 
> I'm not exactly sure how else I can get he.net's attention, because I've been experiencing congestion issues between my dedi and Indiana for a couple of months now, all due to he.net's poor transit, as it turns out.  The issue was complicated by the fact that the routes are asymmetric, and it appears as if the traffic loss is going on somewhere where there is none at all.
> 
> I will just provide the data, and people can make their own conclusions, any insights are welcome.
> 
> During all of this, since some late September 2013, all 4 networks involved have been contacted -- hetzner, init7, he.net, indiana; all except for he.net have responded and did troubleshooting.
> 
> After pressing the lack of any kind of response from he.net, all they did was ask for a customer number, and that was back in September.  I have not heard from their NOC@ ever since, with requests left unanswered, sans the "we have received your request" autoreply.
> 
> Interestingly enough, only some of their Europe-to-US routes are blatantly congested and have very obvious packet loss (often making ssh unusable), whereas others appear to be doing just fine (at least, not losing packets and not experiencing jitter, and the increased latency).  E.g. IPv6 routes don't appear affected, for example.  IPv4 addresses in North America that are announced directly from AS6939 (e.g. Linode in Fremont) don't appear affected, either.  But the multi-homed indiana.edu and wiscnet.net are affected.  The single-homed ntp1.yycix.ca is affected, too.  Probably other customers are affected as well.
> 
> Where's the end to this?
> 
> Or is the ongoing 0.5+% traffic loss, and the 140+ms avg latency on a 114ms route, with random spikes and jitter in certain hours of the day (generally around midnight ET), every day for several weeks or even months, an acceptable practice?
> 
> 
> 
> From hetzner.de through he.net:
> 
> 
> Cns# date ; mtr --report{,-wide,-cycles=600} --interval 0.1 --order "SRL BGAWV" -4 ????c????????.indiana.edu ; date
> Fri Nov 29 21:06:17 PST 2013
> HOST: Cns???????                                   Snt   Rcv Loss% Best Gmean   Avg  Wrst StDev
>  1.|-- static.??.???.4.46.clients.your-server.de    600   600  0.0%  0.5   1.0   1.3   4.9   1.1
>  2.|-- hos-tr1.juniper1.rz13.hetzner.de             600   600  0.0%  0.1   0.2   1.9  66.0   7.6
>  3.|-- core21.hetzner.de                            600   600  0.0%  0.2   0.2   0.2   5.8   0.4
>  4.|-- core22.hetzner.de                            600   600  0.0%  0.2   0.2   0.2  19.4   1.2
>  5.|-- core1.hetzner.de                             600   600  0.0%  4.8   4.8   4.8  13.2   0.7
>  6.|-- juniper1.ffm.hetzner.de                      600   600  0.0%  4.8   4.8   4.8  27.4   1.4
>  7.|-- 30gigabitethernet1-3.core1.ams1.he.net       600   600  0.0% 11.2  14.0  14.6  48.7   4.5
>  8.|-- 10gigabitethernet1-4.core1.lon1.he.net       600   600  0.0% 18.2  19.6  19.9  53.9   4.1
>  9.|-- 10gigabitethernet10-4.core1.nyc4.he.net      600   599  0.2% 87.0 116.1 116.7 145.7  12.4
> 10.|-- 100gigabitethernet7-2.core1.chi1.he.net      600   597  0.5% 106.6 135.4 136.1 192.0  13.3
> 11.|-- ???                                          600     0 100.0  0.0   0.0   0.0   0.0   0.0
> 12.|-- et-11-0-0.945.rtr.ictc.indiana.gigapop.net   600   594  1.0% 113.3 139.3 139.7 166.1  11.4
> 13.|-- xe-0-3-0.11.br2.ictc.net.uits.iu.edu         600   596  0.7% 113.2 139.8 140.3 177.3  12.0
> 14.|-- ae-0.0.br2.bldc.net.uits.iu.edu              600   595  0.8% 114.2 140.1 140.6 183.2  11.8
> 15.|-- ae-10.0.cr3.bldc.net.uits.iu.edu             600   597  0.5% 114.3 140.3 140.8 165.0  11.5
> 16.|-- ????c????????.indiana.edu                    600   597  0.5% 114.7 140.7 141.1 161.6  11.4
> Fri Nov 29 21:08:52 PST 2013
> 
> 
> Cns# unbuffer hping --icmp-ts ????c????????.indiana.edu | \
> perl -ne 'if (/icmp_seq=(\d+) rtt=(\d+\.\d)/) {($s, $p) = ($1, $2);} \
> if (/ate=(\d+) Receive=(\d+) Transmit=(\d+)/) {($o, $r, $t) = ($1, $2, $3);} \
> if (/tsrtt=(\d+)/) { \
> print $s, "\t", $p, "\t", $1, " = ", $r - $o, " + ", $o + $1 - $t, "\n"; }'
> 0       143.5   144 = 87 + 57
> 1       125.5   126 = 69 + 57
> 2       143.6   144 = 87 + 57
> 3       157.9   158 = 102 + 56
> 4       122.0   122 = 66 + 56
> 5       141.6   142 = 85 + 57
> 6       132.2   133 = 76 + 57
> 7       146.2   146 = 89 + 57
> 8       145.1   145 = 88 + 57
> 9       119.9   119 = 63 + 56
> 10      132.7   132 = 75 + 57
> 11      140.1   140 = 83 + 57
> 12      151.0   151 = 94 + 57
> 13      152.6   152 = 96 + 56
> 14      129.1   129 = 72 + 57
> 15      128.5   128 = 71 + 57
> ^C
> 
> 
> 
> Single-homed at he.net:
> 
> 
> Cns# date ; mtr --report{,-cycles=600} --interval 0.1 --order "SRL BGAWV" -4 ntp1.yycix.ca ; date
> Fri Nov 29 21:16:14 PST 2013
> HOST: Cns???????                    Snt   Rcv Loss%   Best Gmean   Avg Wrst StDev
>  1.|-- static.??.???.4.46.client   600   600  0.0%    0.5   1.0   1.3  10.2   1.2
>  2.|-- hos-tr4.juniper2.rz13.het   600   600  0.0%    0.1   0.2   2.0 153.9   9.8
>  3.|-- core22.hetzner.de           600   600  0.0%    0.2   0.2   0.2  10.6   0.6
>  4.|-- core1.hetzner.de            600   600  0.0%    4.8   4.8   4.8  16.4   0.9
>  5.|-- juniper1.ffm.hetzner.de     600   600  0.0%    4.8   4.8   4.8  36.4   1.5
>  6.|-- 30gigabitethernet1-3.core   600   600  0.0%   11.2  13.5  14.0  36.6   4.3
>  7.|-- 10gigabitethernet1-4.core   600   600  0.0%   18.0  21.5  21.8  43.1   4.0
>  8.|-- 10gigabitethernet10-4.cor   600   597  0.5%   93.2 128.0 128.3 157.5   8.9
>  9.|-- 10gigabitethernet1-2.core   600   596  0.7%  103.1 139.4 139.6 157.5   8.2
> 10.|-- 10gigabitethernet3-1.core   600   597  0.5%  128.2 164.9 165.1 181.9   8.2
> 11.|-- 10gigabitethernet1-1.core   600   593  1.2%  138.7 175.9 176.1 192.6   7.8
> 12.|-- sebo-systems-inc.gigabite   600   597  0.5%  139.0 176.4 176.5 187.5   6.9
> 13.|-- ???                         600     0 100.0    0.0   0.0   0.0   0.0   0.0
> 14.|-- ntp1.yycix.ca               600   597  0.5%  141.0 176.9 177.0 186.9   6.9
> Fri Nov 29 21:18:32 PST 2013
> Cns# traceroute -A ntp1.yycix.ca
> traceroute to ntp1.yycix.ca (192.75.191.6), 64 hops max, 40 byte packets
> 1  static.??.???.4.46.clients.your-server.de (46.4.???.??) [AS24940] 0.664 ms  0.648 ms  0.453 ms
> 2  hos-tr1.juniper1.rz13.hetzner.de (213.239.224.1) [AS24940]  23.985 ms hos-tr2.juniper1.rz13.hetzner.de (213.239.224.33) [AS24940]  0.234 ms hos-tr3.juniper2.rz13.hetzner.de (213.239.224.65) [AS24940]  0.238 ms
> 3  core22.hetzner.de (213.239.245.121) [AS24940]  0.238 ms core21.hetzner.de (213.239.245.81) [AS24940]  0.234 ms  0.236 ms
> 4  core1.hetzner.de (213.239.245.177) [AS24940]  4.811 ms  4.809 ms core22.hetzner.de (213.239.245.162) [AS24940]  0.248 ms
> 5  core1.hetzner.de (213.239.245.177) [AS24940]  4.831 ms juniper1.ffm.hetzner.de (213.239.245.5) [AS24940]  4.842 ms  4.826 ms
> 6  juniper1.ffm.hetzner.de (213.239.245.5) [AS24940]  4.857 ms  4.864 ms 30gigabitethernet1-3.core1.ams1.he.net (195.69.145.150) [AS1200] 11.233 ms
> 7  10gigabitethernet1-4.core1.lon1.he.net (72.52.92.81) [AS6939, AS6939]  19.869 ms 30gigabitethernet1-3.core1.ams1.he.net (195.69.145.150) [AS1200]  18.420 ms  11.255 ms
> 8  10gigabitethernet10-4.core1.nyc4.he.net (72.52.92.241) [AS6939, AS6939]  115.845 ms  101.875 ms 10gigabitethernet1-4.core1.lon1.he.net (72.52.92.81) [AS6939, AS6939]  17.249 ms
> 9  10gigabitethernet10-4.core1.nyc4.he.net (72.52.92.241) [AS6939, AS6939]  138.302 ms 10gigabitethernet1-2.core1.tor1.he.net (184.105.222.18) [AS6939]  120.449 ms  139.730 ms
> 10  10gigabitethernet1-2.core1.tor1.he.net (184.105.222.18) [AS6939] 134.755 ms  104.661 ms 10gigabitethernet3-1.core1.ywg1.he.net (184.105.223.221) [AS6939]  167.282 ms
> 11  10gigabitethernet1-1.core1.yyc1.he.net (184.105.223.214) [AS6939] 139.310 ms 10gigabitethernet3-1.core1.ywg1.he.net (184.105.223.221) [AS6939]  155.983 ms  155.910 ms
> 12  sebo-systems-inc.gigabitethernet2-23.core1.yyc1.he.net (216.218.214.250) [AS6939]  138.703 ms  178.530 ms 10gigabitethernet1-1.core1.yyc1.he.net (184.105.223.214) [AS6939] 172.423 ms
> 13  sebo-systems-inc.gigabitethernet2-23.core1.yyc1.he.net (216.218.214.250) [AS6939]  158 ms * *
> 14  * * ntp1.yycix.ca (192.75.191.6) [AS53339]  181.433 ms
> Cns#
> Cns#
> Cns#
> Cns# unbuffer hping --icmp-ts ntp1.yycix.ca | perl -ne \
> 'if (/icmp_seq=(\d+) rtt=(\d+\.\d)/) {($s, $p) = ($1, $2);} \
> if (/ate=(\d+) Receive=(\d+) Transmit=(\d+)/) {($o, $r, $t) = ($1, $2, $3);} \
> if (/tsrtt=(\d+)/) { \
> print $s, "\t", $p, "\t", $1, " = ", $r - $o, " + ", $o + $1 - $t, "\n"; }'
> 0       165.0   165 = 95 + 70
> 1       156.2   156 = 86 + 70
> 2       178.9   179 = 109 + 70
> 3       181.0   181 = 111 + 70
> 4       178.3   179 = 108 + 71
> 5       163.8   164 = 94 + 70
> 6       175.7   176 = 106 + 70
> 7       173.9   174 = 104 + 70
> 8       172.6   173 = 103 + 70
> 9       163.5   164 = 94 + 70
> 10      181.8   182 = 112 + 70
> 11      161.9   162 = 92 + 70
> 12      183.1   184 = 113 + 71
> 13      174.5   174 = 104 + 70
> 14      181.8   181 = 111 + 70
> 15      181.7   181 = 111 + 70
> ^C
> Cns#
> 
> 
> 
> 
> From indiana.edu to hetzner.de; notice that the mtr by itself gives a false impression of a traffic loss at init7, whereas in reality, it's the reverse path through he.net that's causing the loss, as hping confirms:
> 
> 
> m: {5134} date ; sudo mtr --report{,-cycles=600} --interval 0.1 --order "SRL BGAWV" -4 ?????? ; date
> Sat Nov 30 00:36:27 EST 2013
> HOST: ????c????????.indiana.edu     Snt   Rcv Loss%   Best Gmean   Avg Wrst StDev
>  1.|-- 129.79.???.?                600   600  0.0%    0.4   0.7   0.9  24.7   1.5
>  2.|-- ae-13.0.br2.bldc.net.uits   600   600  0.0%    0.5   0.7   0.9  22.6   1.8
>  3.|-- ae-0.0.br2.ictc.net.uits.   600   600  0.0%    1.4   1.7   1.8  20.2   1.6
>  4.|-- xe-0-1-0.11.rtr.ictc.indi   600   600  0.0%    1.4   2.1   3.8  66.5   8.1
>  5.|-- 64.57.21.13                 600   600  0.0%    6.0   7.2   8.4  72.9   8.0
>  6.|-- xe-2-2-0.0.ny0.tr-cps.int   600   600  0.0%   32.3  33.9  34.4  81.0   6.9
>  7.|-- paix-nyc.init7.net          600   600  0.0%   32.5  35.3  35.5  44.7   3.8
>  8.|-- r1lon1.core.init7.net       600   599  0.2%  100.1 104.7 104.9 146.5   7.5
>  9.|-- r1nue1.core.init7.net       600   599  0.2%  114.6 115.7 115.7 125.4   2.2
> 10.|-- gw-hetzner.init7.net        600   594  1.0%  112.4 141.3 142.4 241.9  18.2
> 11.|-- core12.hetzner.de           600   468 22.0%  112.2 142.7 144.0 203.4  20.3
> 12.|-- core21.hetzner.de           600   202 66.3%  114.4 143.7 145.0 204.1  20.1
> 13.|-- juniper1.rz13.hetzner.de    600   594  1.0%  114.7 141.4 142.1 212.2  14.3
> 14.|-- hos-tr2.ex3k11.rz13.hetzn   600   599  0.2%  113.8 123.9 125.5 218.2  21.8
> 15.|-- static.88-198-??-??.clien   599   592  1.2%  114.6 137.2 137.9 167.6  13.2
> 0.244u 1.766s 1:05.52 3.0%      0+0k 0+1io 0pf+0w
> Sat Nov 30 00:37:32 EST 2013
> 
> m: {5137} sudo script -q /dev/null hping3 --icmp-ts 88.198.??.?? | perl -ne 'if (/icmp_seq=(\d+) rtt=(\d+\.\d)/) {($s, $p) = ($1, $2);} \
> if (/ate=(\d+) Receive=(\d+) Transmit=(\d+)/) {($o, $r, $t) = ($1, $2, $3);} \
> if (/tsrtt=(\d+)/) { \
> print $s, "\t", $p, "\t", $1, " = ", $r - $o, " + ", $o + $1 - $t, "\r\n"; }'
> 0       131.3   131 = 57 + 74
> 1       122.4   122 = 56 + 66
> 2       122.6   123 = 56 + 67
> 3       127.6   128 = 57 + 71
> 4       146.5   147 = 57 + 90
> 5       139.8   140 = 56 + 84
> 6       131.0   131 = 57 + 74
> 7       134.6   135 = 57 + 78
> 8       137.7   138 = 57 + 81
> 9       148.1   148 = 57 + 91
> 10      141.2   142 = 57 + 85
> 11      146.4   146 = 56 + 90
> 12      153.6   154 = 57 + 97
> 13      149.4   150 = 57 + 93
> 14      120.2   121 = 57 + 64
> 15      120.6   120 = 56 + 64
> 16      130.7   131 = 57 + 74
> 17      126.4   126 = 56 + 70
> 18      117.9   118 = 57 + 61
> 19      116.9   117 = 57 + 60
> 20      119.8   119 = 56 + 63
> 21      132.0   132 = 56 + 76
> 22      134.2   134 = 56 + 78
> 23      138.8   139 = 57 + 82
> 
> 
> 
> Note the ICMP timestamp data from hping above.  From this ICMP timestamping data, it is obvious that the congestion is only happening on one path -- the one over he.net, and init7 is in the clear.
> 
> Any further insights are welcome.  But finding out about the ICMP timestamp feature has so far been the most useful thing in troubleshooting this issue; I'm surprised it's a rather unknown method to get to the bottom of these problems.
> 
> However, even after finding out about the cause and the party responsible, the problem is yet to be exhausted.  Any help appreciated.
> 
> Best regards,
> Constantine.
>