Skip to content

Instantly share code, notes, and snippets.

@ephemient
Created September 25, 2018 21:52
Show Gist options
  • Save ephemient/b73b9c5f28c71936a02240aaaaf0b7a7 to your computer and use it in GitHub Desktop.
Save ephemient/b73b9c5f28c71936a02240aaaaf0b7a7 to your computer and use it in GitHub Desktop.
Sister cities

Source: Sister Cities of the World + Graphviz

7158 18672 2463 731 0.7578 198 0.8743

  • 7158 nodes
  • 18672 edges
  • 2463 connected components
  • 731 strongly connected components
  • 0.7578 fraction of nodes in a non-trivial strongly connected components
  • 198 maximum degree
  • 0.8743 fraction of non-tree edges

There are some minor issues with the data; some sister-city relationships are not hyperlinked, and links for cities with / in their name result in 404.

#!/bin/bash
wget -c -r -np en.sistercity.info
for i in en.sistercity.info/sister-cities/*.html; do
xmllint --html --xpath '//div[@class="label"]/a/@href' "$i" 2>/dev/null |
perl -Mre=/a -nE '
sub escape { shift =~ s/([[:^graph:]])/sprintf "%%%02X", ord($1)/egr }
sub unescape { shift =~ s/%([[:xdigit:]]{2})/chr(hex($1))/egr }
BEGIN { say escape(shift), " -> {" }
END { say "}" }
say "\t" . escape(unescape($1)) while /href="(.*?)"/g
' "${i##*/}"
done |
perl -pE '
BEGIN { say "digraph {" }
END { say "$b [label=\"$a\"]" while ($a, $b) = each %id; say "}" }
s%(\S*\w\S*)%$id{$1} //= "city" . ++$n%eg
' |
sccmap -sv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment