-
-
Save ganeshan/1a713ec072aaeab59b31e2845bdaf33d to your computer and use it in GitHub Desktop.
Cleaning up GeoNames for Solr
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Reduce the columns | |
| cut -f1-2,5-6 allCountries.txt > allCountries_red.txt | |
| # Add a header row | |
| sed '1s/^/id title_s lat lng\ | |
| /g' allCountries_red.txt > allCountries_head.txt | |
| # Add wkt requires csvpys https://github.com/cypreess/csvkit/blob/master/docs/scripts/csvpys.rst | |
| csvpys --tab -s wkt_rpt "'POINT(' + ch['lng'] + ' ' + ch['lat'] + ')'" allCountries_head.txt > allCountries_wkt.txt | |
| # Only keep the columns we need | |
| csvcut -c 1,2,5 allCountries_wkt.txt > allCountries_wkt_cut.txt | |
| # Convert to json | |
| csvjson -i 2 allCountries_wkt_cut.txt > allCountries.json | |
| #Index into solr | |
| curl 'http://localhost:8983/solr/[corename]/update?commit=true' --data-binary @allCountries.json -H 'Content-type:application/json' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment