Last active
October 3, 2017 13:59
-
-
Save tkan/4cd18fee60bb3cab051d92425e6a3429 to your computer and use it in GitHub Desktop.
Mundraub.org shows plants in common land which can be freely harvested. This small script will harvest the data from Mundraub.org in a (hopefully) friendly way. It will output GeoJSON which can be easily transfered to GPX, SHP or whatnot. How to get the inital JSON is a small riddle for the user itself.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# -*- coding: utf-8 -*- | |
import json | |
from geojson import Feature, Point, FeatureCollection | |
from lxml import etree, html | |
from urllib2 import urlopen | |
import re | |
from tqdm import * | |
# open file | |
with open('plant.json') as data_file: | |
data = json.load(data_file) | |
# declare some variables | |
i = 0 | |
geo_json_list = [] | |
# iterate over all elements of JSON | |
while i < len(data['features']): | |
# loop for the tqdm progress bar | |
for i in tqdm(range(len(data['features'])), desc = 'Getting data '): | |
y = data['features'][i]['pos'][0] | |
x = data['features'][i]['pos'][1] | |
my_point = Point((float(x), float(y))) | |
nid = data['features'][i]['properties']['nid'] | |
# get the description from the website | |
url = "https://www.mundraub.org/node/" + nid | |
soup = html.fromstring(urlopen(url).read().decode('utf-8')) | |
croutons = soup.find_class("processed_text") | |
for item in croutons: | |
description = html.tostring(item,encoding='unicode', method='text') | |
my_feature = Feature(geometry=my_point, properties={'nid': nid, 'description': description}) | |
geo_json_list.append(my_feature) | |
i += 1 | |
# write file | |
with open('plant.geojson', 'rb+') as f: | |
json.dump(FeatureCollection(geo_json_list), f) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Great, just what I was looking for. I had to change the filename in line 12 to match the one in line 43 (plant.geojson) in ordner not the get a file not found error.