Skip to content

Instantly share code, notes, and snippets.

@lost-theory
Last active September 10, 2018 11:25
Show Gist options
  • Save lost-theory/6309478 to your computer and use it in GitHub Desktop.
Save lost-theory/6309478 to your computer and use it in GitHub Desktop.
netstat on all machines -> python -> graphviz -> png
$ knife ssh -m "...every host in the network..." "sudo netstat -nutap" -a hostname > meganetstat.txt
$ python
>>> from collections import Counter as C
>>> HS = "...every host in the network...".split()
>>> ip = lambda s: s.split(":")[0]
>>> xs = [map(ip, [x[0], x[4], x[5]]) for x in [x.strip().split() for x in open("meganetstat.txt").readlines() if "tcp" in x] if len(x)>=6]
>>> ipmap = [(h, C([x[1] for x in xs if x[0] == h])) for h in HS]
>>> ipmapx = dict([(sorted([(x,y) for (x,y) in ip[1].items() if x.startswith("10.")], key=lambda t: -t[1])[0][0], ip[0]) for ip in ipmap])
>>> sorted(C(map(ipmapx.get, [x[2] for x in xs if x[2].startswith("10.")])).items(), key=lambda t: t[1])
[...a list of hosts ordered by # of incoming edges, load balancers had the most, etc...]
>>> edges = [(x[0], ipmapx.get(x[2])) for x in xs]
>>> open("out.gv", "w").write(("digraph world {\n" + ("\n".join('\t"%s" -> "%s";' % x for x in set(edges) if "None" not in repr(x) and x[0] != x[1])) + "\n}\n"))
out.gv looks like this:
digraph world {
"host3" -> "host4";
"host18" -> "host1";
"host3" -> "host10";
"host5" -> "host7";
...hundreds of more edges...
}
then you use the "dot" command to render the graph to an image:
$ dot -Tpng out.gv > out
$ dot -Tpng -Ktwopi out.gv > out3.png
$ dot -Tpng -Kcirco out.gv > out4.png
$ dot -Tpng -Ksfdp out.gv > out5.png
$ dot -Ksfdp -Gsize=100! -Goverlap=prism -Tpng out.gv > out6.png
i believe that last one gave the best output

Sorry this code is really brittle, has horrible names, and depends very specifically on the format of the "meganetstat.txt" file and some hardcoded assumptions. I will try to walk through the code a bit in case you want to re-use it.

Here's what the input file should look like:

foo-server     Active Internet connections (servers and established)
foo-server     Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
foo-server     tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1292/sshd
foo-server     tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      1497/master
...

This is just netstat -nutap with the name of the server prefixed to every line. I don't use "knife" anymore, so I would probably do something different (e.g. dump each machine's output to a file?), but the idea is the same, you need the full netstat output along with the host it belongs to.

Once you have this file, xs is defined to be a list of [src_hostname, src_ip, dst_ip] triples:

>>> pprint.pprint(xs)
[['foo-server', '0.0.0.0', '0.0.0.0'],
 ['foo-server', '127.0.0.1', '0.0.0.0'],
 ['foo-server', '0.0.0.0', '0.0.0.0'],
 ['foo-server', '0.0.0.0', '0.0.0.0'],
 ['foo-server', '172.11.111.11', '172.11.111.55'],
...
]

Any host can have multiple IPs (127.0.0.1, 0.0.0.0, multiple NICs, public network, private network, etc.), so how do we know how to associate these with other machines in the cluster? We first define ipmap as a list of (hostname, Counter(ip -> occurences)) pairs.

>>> ipmap = [(h, C([x[1] for x in xs if x[0] == h])) for h in HS]
>>> ipmap
[('foo-server', Counter({'172.11.111.11': 929, '127.0.0.1': 708, '': 679, '0.0.0.0': 3}))]

For this experiment I only wanted to map out the connections on the private network. For this I used the dumb heuristic of filtering the IPs with x.startswith('10.') (or 172. or whatever it is on your network). I also decided to only use the "most common" private IP to make the code simpler. ipmapx is defined by mapping the most common private IP for each hostname to that hostname:

>>> ipmapx = dict([(sorted([(x,y) for (x,y) in ip[1].items() if x.startswith("172.")], key=lambda t: -t[1])[0][0], ip[0]) for ip in ipmap])
>>> ipmapx
{'172.11.111.11': 'foo-server'}

Finally we can use this mapping to walk over the [src_hostname, src_ip, dst_ip] triples and associate the dst_ip with a hostname using ipmapx to get a list of [src_hostname, dst_hostname] pairs, which are the edges of the network graph:

>>> edges = [(x[0], ipmapx.get(x[2])) for x in xs]
>>> pprint.pprint(edges)
[('foo-server', None),
 ('foo-server', None),
 ('foo-server', None),
 ('foo-server', None),
 ('foo-server', 'foo-server'),
 ('foo-server', 'foo-server'),
 ('foo-server', 'foo-server'),
 ('foo-server', 'foo-server'),
 ('foo-server', 'foo-server'),
...
]

Then the rest of the code from there it's just writing the graphviz format as text like so:

digraph world {
    "host3" -> "host4";
    "host18" -> "host1";
    "host3" -> "host10";
    "host5" -> "host7";
    ...hundreds of more edges...
}

And showing different layout / size options.

@stuart-warren
Copy link

@schlomo See the docs - not that i've tried this yet, but it looked cool as an idea
http://docs.opscode.com/chef/knife.html#id293

@lost-theory
Copy link
Author

@schlomo sorry for the 2 year late response 😸, but I updated the gist with a walkthrough of the code and the data that each step should be producing:

https://gist.github.com/lost-theory/6309478#file-netstat-2015-md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment