Last active
January 11, 2018 19:20
-
-
Save cmarat/51e07f1a1165c4b3c158 to your computer and use it in GitHub Desktop.
Convert Geonames RDF dump to n-triples.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Created on 13 Nov 2014 | |
@author: <https://github.com/cmarat> | |
Convert Geonames RDF dump [1] to n-triples. | |
Uncompress the dump and pass the file name as a command | |
line parameter, or pipe it into stdin. | |
[1] http://download.geonames.org/all-geonames-rdf.zip | |
''' | |
import rdflib | |
import fileinput | |
fout = open('geonames.nt', 'w') | |
xml_lines = (l for l in fileinput.input() if l[:5] == '<?xml') | |
for line in xml_lines: | |
g = rdflib.Graph() | |
g.parse(data=line, format='xml') | |
fout.write(g.serialize(format='nt')) | |
fout.close() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment