Sorry this code is really brittle, has horrible names, and depends very specifically on the format of the "meganetstat.txt" file and some hardcoded assumptions. I will try to walk through the code a bit in case you want to re-use it.
Here's what the input file should look like:
foo-server Active Internet connections (servers and established)
foo-server Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
foo-server tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1292/sshd
foo-server tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1497/master
...
This is just netstat -nutap
with the name of the server prefixed to
every line. I don't use "knife" anymore, so I would probably do something
different (e.g. dump each machine's output to a file?), but the idea is the
same, you need the full netstat output along with the host it belongs to.
Once you have this file, xs
is defined to be a list of [src_hostname, src_ip, dst_ip]
triples:
>>> pprint.pprint(xs)
[['foo-server', '0.0.0.0', '0.0.0.0'],
['foo-server', '127.0.0.1', '0.0.0.0'],
['foo-server', '0.0.0.0', '0.0.0.0'],
['foo-server', '0.0.0.0', '0.0.0.0'],
['foo-server', '172.11.111.11', '172.11.111.55'],
...
]
Any host can have multiple IPs (127.0.0.1, 0.0.0.0, multiple NICs, public network, private
network, etc.), so how do we know how to associate these with other machines in the cluster?
We first define ipmap
as a list of (hostname, Counter(ip -> occurences))
pairs.
>>> ipmap = [(h, C([x[1] for x in xs if x[0] == h])) for h in HS]
>>> ipmap
[('foo-server', Counter({'172.11.111.11': 929, '127.0.0.1': 708, '': 679, '0.0.0.0': 3}))]
For this experiment I only wanted to map out the connections on the private network. For this I used the dumb heuristic of filtering the IPs with x.startswith('10.')
(or 172.
or whatever it is on your network). I also decided
to only use the "most common" private IP to make the code simpler. ipmapx
is defined by
mapping the most common private IP for each hostname to that hostname:
>>> ipmapx = dict([(sorted([(x,y) for (x,y) in ip[1].items() if x.startswith("172.")], key=lambda t: -t[1])[0][0], ip[0]) for ip in ipmap])
>>> ipmapx
{'172.11.111.11': 'foo-server'}
Finally we can use this mapping to walk over the [src_hostname, src_ip, dst_ip]
triples and
associate the dst_ip
with a hostname using ipmapx
to get a list of [src_hostname, dst_hostname]
pairs, which are the edges of the network graph:
>>> edges = [(x[0], ipmapx.get(x[2])) for x in xs]
>>> pprint.pprint(edges)
[('foo-server', None),
('foo-server', None),
('foo-server', None),
('foo-server', None),
('foo-server', 'foo-server'),
('foo-server', 'foo-server'),
('foo-server', 'foo-server'),
('foo-server', 'foo-server'),
('foo-server', 'foo-server'),
...
]
Then the rest of the code from there it's just writing the graphviz format as text like so:
digraph world {
"host3" -> "host4";
"host18" -> "host1";
"host3" -> "host10";
"host5" -> "host7";
...hundreds of more edges...
}
And showing different layout / size options.
Cool thing! Sadly I get
My input looks like this:
Not knowing knife I am not sure what the "-a hostname" does in your knife call. I used pdsh...