[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

looking for hostname router identifier validation

I would caution against putting much faith in the validity of geolocation
or site ID by reverse DNS PTR records. There are a vast number of
unmaintained, ancient, stale, erroneous or wildly wrong PTR records out
there. I can name at least a half dozen ISPs that have absorbed other ASes,
some of those which also acquired other ASes earlier in their history,
forming a turducken of obsolete PTR records that has things with ISP domain
names last in use in the year 2002.

On Mon, Apr 29, 2019 at 6:15 AM Matthew Luckie <mjl at luckie.org.nz> wrote:

> To support Internet topology analysis efforts, I have been working on
> an algorithm to automatically detect router names inside hostnames
> (PTR records) for router interfaces, and build regular expressions
> (regexes) to extract them.  By "router name" inside the hostname, I
> mean a substring, or set of non-contiguous substrings, that is common
> among interfaces on a router.  For example, suppose we had the
> following three routers in the savvis.net domain suffix, each with two
> interfaces:
> das1-v3005.nj2.savvis.net
> das1-v3006.nj2.savvis.net
> das1-v3005.oc2.savvis.net
> das1-v3007.oc2.savvis.net
> das2-v3009.nj2.savvis.net
> das2-v3012.nj2.savvis.net
> We might infer the router names are das1|nj2, das1|oc2, and das2|nj2,
> respectively, and captured by the regex:
> ^([a-z]+\d+)-[^\.]+\.([a-z]+\d+)\.savvis\.net$
> After much refinement based on smaller sets of ground truth, I'm
> asking for broader feedback from operators.  I've placed a webpage at
> https://www.caida.org/~mjl/rnc/ that shows the inferences my algorithm
> made for 2523 domains.  If you operate one of the domains in that
> list, I would appreciate it if you could comment (private is probably
> better but public is fine with me) on whether the regex my algorithm
> inferred represents your naming intent.  In the first instance, I am
> most interested in feedback for the suffix / date combinations for
> suffixes that are colored green, i.e. appear to be reasonable.
> Each suffix / date combination links to a page that contains the
> naming convention and corresponding inferences.  The colored part of
> each hostname is the inferred router name.  The green hostnames appear
> to be correct, at least as far as the algorithm determined.  Some
> suffixes have errors due to either stale hostnames or incorrect
> training data, and those hostnames are colored red or orange.
> If anyone is interested in sets of hostnames the algorithm may have
> inferred as 'stale' for their network, because for some operators it
> was an oversight and they were grateful to learn about it, I can
> provide that information.
> Thanks,
> Matthew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20190429/b4c425c5/attachment.html>