Geographic distribution of texts/collections in Trismegistos. Circle position is location of the collection, circle size corresponds to the number of texts recorded as currently held by that collection. You can pan/zoom inside the map for more detail.
Mike Bostock's Let's Make a Map and Let's Make a Bubble Map tutorials were immensely helpful in learning how to create this visualization.
Data is from a private Trismegistos CSV data dump, from which the collection sizes and addresses are extracted and output to another CSV (tm-collections-addresses-clean.csv). The addresses were then geocoded in multiple steps:
- The addresses CSV (
tm-collections-addresses-clean.csv) was loaded into Google Fusion Tables with the address column set to a "location" type, then a map visualization was created to geocode the column and enable KML download. Unfortunately, the KML from this doesn't have point coordinates associated with it, so it was opened in Google Earth and re-exported to add point coordinates.kml2csv.xslthen turns this back into a CSV (tm-collections-addresses-geocoded.csv). - Also unfortunately, the "Fusion Tables KML->Google Earth KML" process doesn't associate points to all the same features that Fusion Tables will geocode and display on a map.
geocode-addresses.rbwas written as a simple script to pipe addresses through the raw Google Geocoding API. Since the API has a limit of 2500 requests per day, I don't want to re-geocode addresses I already have a point for, so I wrotesubtract-csv.rbto subtracttm-collections-addresses-geocoded.csvfromtm-collections-addresses-clean.csvand maketm-collections-addresses-nongeocoded.csv. The geocoded results of this are stored intm-collections-addresses-geocoded-remainder.csv. - We might have some incorrectly-geocoded addresses, so
tm-collections-manually-geocoded.csvlets us record these. merge-geocode-csv.rbmergestm-collections-addresses-geocoded.csv,tm-collections-addresses-geocoded-remainder.csv, andtm-collections-manually-geocoded.csvto make the finaltm-collections-geocoded.csv.
This is all also semi-automated/documented in the Makefile.