Skip to content

Instantly share code, notes, and snippets.

@mciantyre
Last active December 18, 2023 02:39
Show Gist options
  • Save mciantyre/32ff2c2d5cd9515c1ee7 to your computer and use it in GitHub Desktop.
Save mciantyre/32ff2c2d5cd9515c1ee7 to your computer and use it in GitHub Desktop.
KML to CSV in Python
"""
A script to take all of the LineString information out of a very large KML file. It formats it into a CSV file so
that you can import the information into the NDB of Google App Engine using the Python standard library. I ran this
script locally to generate the CSV. It processed a ~70 MB KML down to a ~36 MB CSV in about 8 seconds.
The KML had coordinates ordered by
[Lon, Lat, Alt, ' ', Lon, Lat, Alt, ' ',...] (' ' is a space)
The script removes the altitude to put the coordinates in a single CSV row ordered by
[Lat,Lon,Lat,Lon,...]
Dependencies:
- Beutiful Soup 4
- lxml
I found a little bit of help online for using BeautifulSoup to process a KML file. I put this online to serve as
another example. Some things I learned:
- the BeautifulSoup parser *needs* to be 'xml'. I spent too much time debugging why the default one wasn't working, and
it was because the default is an HTML parse, not XML.
tl;dr
KML --> CSV so that GAE can go CSV --> NDB
"""
from bs4 import BeautifulSoup
import csv
def process_coordinate_string(str):
"""
Take the coordinate string from the KML file, and break it up into [Lat,Lon,Lat,Lon...] for a CSV row
"""
space_splits = str.split(" ")
ret = []
# There was a space in between <coordinates>" "-80.123...... hence the [1:]
for split in space_splits[1:]:
comma_split = split.split(',')
ret.append(comma_split[1]) # lat
ret.append(comma_split[0]) # lng
return ret
def main():
"""
Open the KML. Read the KML. Open a CSV file. Process a coordinate string to be a CSV row.
"""
with open('doc.kml', 'r') as f:
s = BeautifulSoup(f, 'xml')
with open('out.csv', 'wb') as csvfile:
writer = csv.writer(csvfile)
for coords in s.find_all('coordinates'):
writer.writerow(process_coordinate_string(coords.string))
if __name__ == "__main__":
main()
@anamariakantar
Copy link

Nice work guys, thanks a lot for sharing! I needed to take some more columns out of my kml file (name, description, and add some custom columns) along with the coordinates, so I used your code and created a new gist (works in Python 3): https://gist.github.com/anamariakantar/a0c154a3df92a0ee7adc7f7a78061623

@freshlydoug
Copy link

@mciantrye Thanks for this. I'm using Python 2.7 and have installed BS4 and lxml. I get an out.csv file but it's 80 empty lines with no error. Is there something obvious I'm missing? Thanks, Doug

@tblacerda
Copy link

Thanks guys. Your code helped me today.

@lancelot1969
Copy link

from bs4 import BeautifulSoup
import csv

"""
Take the coordinate string from the KML file, and break it up into [Lat,Lon,Lat,Lon...] for a CSV row
"""
def process_coordinate_string(str):
ret = []
comma_split = str.split(',')
return [comma_split[1].strip(), comma_split[0].strip()]

"""
Open the KML. Read the KML. Open a CSV file. Process a coordinate string to be a CSV row.
"""
def main():
with open(path, 'r') as f:
s = BeautifulSoup(f, 'xml')
with open('out.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
for coords in s.find_all('coordinates'):
writer.writerow(process_coordinate_string(coords.string))

if name == "main":
main()

@akintunero
Copy link

I got the error below while trying to run the script:

line 25, in
from bs4 import BeautifulSoup
ModuleNotFoundError: No module named 'bs4'

@slonak79
Copy link

slonak79 commented Oct 19, 2019

@akintunero you need to install the dependency:
pip install bs4

@egregr
Copy link

egregr commented Dec 17, 2019

NOTE: Gave me an error using Python3
"a bytes-like object is required, not 'str' "

This can be solved by modifying line 48 of the code by changing the mode of opening the file from 'wb' to simply 'w'.

Anyways great code, thanks a lot.

@miasolodky
Copy link

Thanks so much!
I modified it to work with KML files created with google earth:
https://github.com/miasolodky/Google-Earth-KML-to-CSV/blob/master/kmltocsv.ipynb

@akintunero
Copy link

akintunero commented Jan 29, 2020 via email

@miasolodky
Copy link

miasolodky commented Jan 30, 2020 via email

@shantanu848
Copy link

TypeError: a bytes-like object is required, not 'str'
This error is coming in python 3.8, I don't know why.
My input list to be written is [19.33482579812116, 77.01685649730679, 19.33477755271189, 77.01738131023461, 19.33423333079384, 77.0173191392798, 19.33418091607818, 77.01764031166668, 19.33537616602636, 77.01780075660297, 19.33543133527809, 77.01679442455995, 19.33482579812116, 77.01685649730679] still, it is reading it as str.

@egregr
Copy link

egregr commented Mar 6, 2020

Refer to my previous comment, while opening the file change 'wb' to 'w'. When using 'wb' you are telling him to write in binary mode.

@mohitsingh2806
Copy link

mohitsingh2806 commented Mar 15, 2020

Hi,
Your code helped me alot as I had never worked with bs4! Thanks!
I needed to tabulate the placemarks I had made in Google Earth (around 500).
I have added some code to save the lat long data provided by you, along with the name and descriptions of the placemarks which I needed for my code. This works in python3. I hope it helps somebody.

https://gist.github.com/mohitsingh2806/deee300a2f5bdd2768967116bd209019

EDIT: Seeing my code again, I realised that due to lot of smaller changes incrementally, this code is now almost completely different from the original code you had shared. But, nonetheless, I must say that your code helped me a lot and thank you again for it.

@shantanu848
Copy link

Refer to my previous comment, while opening the file change 'wb' to 'w'. When using 'wb' you are telling him to write in binary mode.

Thank you, helped me complete that task.

@WxBDM
Copy link

WxBDM commented Jul 6, 2021

This helped me, thanks! I needed something slightly more pandas-friendly, so I slightly edited it. I'm sharing it in this thread in case someone else needs it (Python 3.x):

def process_coordinate_string(str):
    """
    Take the coordinate string from the KML file, and break it up into [Lat,Lon,Lat,Lon...] for a CSV row
    """
    space_splits = str.split(" ")
    ret = []
    # There was a space in between <coordinates>" "-80.123...... hence the [1:]
    for split in space_splits[1:]:
        comma_split = split.split(',')
        ret.append(comma_split[1])    # lat
        ret.append(comma_split[0])    # lng
    return ret

def main():
    """
    Open the KML. Read the KML. Open a CSV file. Process a coordinate string to be a CSV row.
    """
    with open('input.kml', 'r') as f:
        s = BeautifulSoup(f, 'xml')
        
    for coords in s.find_all('coordinates'):
        data = process_coordinate_string(coords.string)

    lats = [float(x) for index, x in enumerate(data) if index % 2 == 0]
    lons = [float(x) for index, x in enumerate(data) if index % 2 == 1]

    df = pd.DataFrame({'Lat' : lats, 'Lon' : lons})
    df.to_csv("kml_to_df.csv", index = False)

@josmarcristello
Copy link

josmarcristello commented Nov 12, 2021

Slight modification on WxBDM, as I had some issues with lack of standardization on the kml file generated. Also, imports from a kml folder and exports to a csv folder with the same shared filename, to allow for mass conversion. Function is now called with the kml filename as an argument.

kml2csv('test.kml')
from bs4 import BeautifulSoup
import csv

def process_coordinate_string(str):
    """
    Take the coordinate string from the KML file, and break it up into [Lat,Lon,Lat,Lon...] for a CSV row
    """
    space_splits = str.split(" ")
    ret = []
    # There was a space in between <coordinates>" "-80.123...... hence the [1:]
    for split in space_splits[1:]:
        comma_split = split.split(',')
        # Checks for len on the split, because depending on kml file generator you might get an empty 
        # string (which would be misinterpreted as a coordinate)
        if(len(split.split(',')) == 3):
            ret.append(comma_split[1])  # lat
            ret.append(comma_split[0])  # lng
    return ret

def kml2csv(fname):
    """
    Open the KML. Read the KML. Open a CSV file. Process a coordinate string to be a CSV row.
    Input: Filename with extension ('example.kml'), located in 'kml' folder.
    Output: File with the same name as input, but in .csv format, located in 'csv' folder.
    """
    out_fname = fname.split('.kml')[0] + '.csv'
    with open('kml/'+fname, 'r') as f:
        s = BeautifulSoup(f, 'xml')
        
    for coords in s.find_all('coordinates'):
        data = process_coordinate_string(coords.string)

    lats = [float(x) for index, x in enumerate(data) if index % 2 == 0]
    lons = [float(x) for index, x in enumerate(data) if index % 2 == 1]    
    df = pd.DataFrame({'Lat' : lats, 'Lon' : lons})
    
    
    df.to_csv("csv/"+out_fname, index = False)   

@ivanskigib
Copy link

I am using the above examples but I only get the first and last coordinate in a csv file. It is as if it is not looping, however since I am getting the first and last coordinate I have to assume that it is reading the coordinates list.

`def process_coordinate_string(str):

# Take the coordinate string from the KML file, and break it up into [Lat,Lon,Lat,Lon...] for a CSV row

ret = []
comma_split = str.split(',')
return [comma_split[1], comma_split[0]]

def main():

# Open the KML. Read the KML. Open a CSV file. Process a coordinate string to be a CSV row.

with open('61956195-6202689-a300234067548720_2022-02-23-16-15-48.kml', 'r') as f:
    s = BeautifulSoup(f, 'xml')
    with open('trajectory-6195.csv', 'w', newline='') as csvfile:
        writer = csv.writer(csvfile)
        for coords in s.find_all('coordinates'):
            writer.writerow(process_coordinate_string(coords.string))                

if name == "main":
main()`

I am a relative beginner. Any reason as to why that may be happening?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment