Last active
December 18, 2023 02:39
-
-
Save mciantyre/32ff2c2d5cd9515c1ee7 to your computer and use it in GitHub Desktop.
KML to CSV in Python
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
A script to take all of the LineString information out of a very large KML file. It formats it into a CSV file so | |
that you can import the information into the NDB of Google App Engine using the Python standard library. I ran this | |
script locally to generate the CSV. It processed a ~70 MB KML down to a ~36 MB CSV in about 8 seconds. | |
The KML had coordinates ordered by | |
[Lon, Lat, Alt, ' ', Lon, Lat, Alt, ' ',...] (' ' is a space) | |
The script removes the altitude to put the coordinates in a single CSV row ordered by | |
[Lat,Lon,Lat,Lon,...] | |
Dependencies: | |
- Beutiful Soup 4 | |
- lxml | |
I found a little bit of help online for using BeautifulSoup to process a KML file. I put this online to serve as | |
another example. Some things I learned: | |
- the BeautifulSoup parser *needs* to be 'xml'. I spent too much time debugging why the default one wasn't working, and | |
it was because the default is an HTML parse, not XML. | |
tl;dr | |
KML --> CSV so that GAE can go CSV --> NDB | |
""" | |
from bs4 import BeautifulSoup | |
import csv | |
def process_coordinate_string(str): | |
""" | |
Take the coordinate string from the KML file, and break it up into [Lat,Lon,Lat,Lon...] for a CSV row | |
""" | |
space_splits = str.split(" ") | |
ret = [] | |
# There was a space in between <coordinates>" "-80.123...... hence the [1:] | |
for split in space_splits[1:]: | |
comma_split = split.split(',') | |
ret.append(comma_split[1]) # lat | |
ret.append(comma_split[0]) # lng | |
return ret | |
def main(): | |
""" | |
Open the KML. Read the KML. Open a CSV file. Process a coordinate string to be a CSV row. | |
""" | |
with open('doc.kml', 'r') as f: | |
s = BeautifulSoup(f, 'xml') | |
with open('out.csv', 'wb') as csvfile: | |
writer = csv.writer(csvfile) | |
for coords in s.find_all('coordinates'): | |
writer.writerow(process_coordinate_string(coords.string)) | |
if __name__ == "__main__": | |
main() |
I am using the above examples but I only get the first and last coordinate in a csv file. It is as if it is not looping, however since I am getting the first and last coordinate I have to assume that it is reading the coordinates list.
`def process_coordinate_string(str):
# Take the coordinate string from the KML file, and break it up into [Lat,Lon,Lat,Lon...] for a CSV row
ret = []
comma_split = str.split(',')
return [comma_split[1], comma_split[0]]
def main():
# Open the KML. Read the KML. Open a CSV file. Process a coordinate string to be a CSV row.
with open('61956195-6202689-a300234067548720_2022-02-23-16-15-48.kml', 'r') as f:
s = BeautifulSoup(f, 'xml')
with open('trajectory-6195.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
for coords in s.find_all('coordinates'):
writer.writerow(process_coordinate_string(coords.string))
if name == "main":
main()`
I am a relative beginner. Any reason as to why that may be happening?
Thanks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Slight modification on WxBDM, as I had some issues with lack of standardization on the kml file generated. Also, imports from a kml folder and exports to a csv folder with the same shared filename, to allow for mass conversion. Function is now called with the kml filename as an argument.