Last active
December 24, 2016 17:05
-
-
Save mejackreed/96e278aedb42e458dfb8 to your computer and use it in GitHub Desktop.
Cleaning up GeoNames for Solr
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Reduce the columns | |
cut -f1-2,5-6 allCountries.txt > allCountries_red.txt | |
# Add a header row | |
sed '1s/^/id title_s lat lng\ | |
/g' allCountries_red.txt > allCountries_head.txt | |
# Add wkt requires csvpys https://github.com/cypreess/csvkit/blob/master/docs/scripts/csvpys.rst | |
csvpys --tab -s wkt_rpt "'POINT(' + ch['lng'] + ' ' + ch['lat'] + ')'" allCountries_head.txt > allCountries_wkt.txt | |
# Only keep the columns we need | |
csvcut -c 1,2,5 allCountries_wkt.txt > allCountries_wkt_cut.txt | |
# Convert to json | |
csvjson -i 2 allCountries_wkt_cut.txt > allCountries.json | |
#Index into solr | |
curl 'http://localhost:8983/solr/[corename]/update?commit=true' --data-binary @allCountries.json -H 'Content-type:application/json' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment