Last active
August 29, 2015 14:13
-
-
Save brews/e042df945f909d4960a6 to your computer and use it in GitHub Desktop.
A very simple script to parse an ITRDB metadata DIF XML files and put select metadata into a tab-delimited file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python3 | |
# 2015-01-22 | |
# Copyright 2015 S. Brewster Malevich <[email protected]> | |
# Parse a directory's ITRDB .xml files. The target directory should be the first | |
# argument when calling this script. The default is to use the current working | |
# directory. This script extract select data from each XML file and writes the | |
# metadata to a tab-delimited file, `OUTFILE_NAME`. | |
# | |
# When the XML files are in the current working directory with this script, run: | |
# python3 itrdb_javelina.py | |
# To specify a directory with XML files: | |
# python3 itrdb_javelina.py \path\to\xml\directory\ | |
# If your path has a space, you need to escape it with `/`. For example: | |
# python3 itrdb_javelina.py \path\to/ xml/ directory\ | |
# uses the directory "\path\to xml directory\". | |
import sys | |
import os | |
import glob | |
import xml.etree.ElementTree as ET | |
FOO = "{http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/}" | |
OUTFILE_NAME = "xmlmeta.tsv" | |
def main(): | |
target_dir = os.getcwd() | |
try: | |
target_dir = sys.argv[1] | |
except IndexError: | |
pass | |
out = [] | |
header = "uid\tcreator\tspp\tstartdate\tstopdate\tslat\tnlat\twlon\telon\tminalt\tmaxalt\turl\n" | |
out.append(header) | |
for fl in glob.iglob(target_dir + "/*.xml"): | |
tree = ET.parse(fl) | |
root = tree.getroot() | |
uid = root.findtext(FOO + "Entry_ID") | |
creator = root.findtext(FOO + "Data_Set_Citation/" + FOO + "Dataset_Creator") | |
for child in root.findall(FOO + "Parameters"): | |
if child.findtext(FOO + "Variable_Level_1") == "tree species": | |
spp = child.findtext(FOO + "Detailed_Variable") | |
for child in root.findall(FOO + "Paleo_Temporal_Coverage/" + FOO + "Paleo_Start_Date"): | |
if len(child.text.split(" ")) == 2: | |
startdate = child.text | |
else: | |
continue | |
for child in root.findall(FOO + "Paleo_Temporal_Coverage/" + FOO + "Paleo_Stop_Date"): | |
if len(child.text.split(" ")) == 2: | |
stopdate = child.text | |
else: | |
continue | |
slat = root.findtext(FOO + "Spatial_Coverage/" + FOO + "Southernmost_Latitude") | |
nlat = root.findtext(FOO + "Spatial_Coverage/" + FOO + "Northernmost_Latitude") | |
wlon = root.findtext(FOO + "Spatial_Coverage/" + FOO + "Westernmost_Longitude") | |
elon = root.findtext(FOO + "Spatial_Coverage/" + FOO + "Easternmost_Longitude") | |
minalt = root.findtext(FOO + "Spatial_Coverage/" + FOO + "Minimum_Altitude") | |
maxalt = root.findtext(FOO + "Spatial_Coverage/" + FOO + "Maximum_Altitude") | |
for child in root.findall(FOO + "Related_URL/" + FOO + "URL"): | |
url = child.text.strip("\n") | |
ln = [uid, creator, spp, startdate, stopdate, slat, nlat, wlon, elon, minalt, maxalt, url] | |
out.append("\t".join(ln) + "\n") | |
with open(target_dir + "/" + OUTFILE_NAME, "w") as fl: | |
fl.writelines(out) | |
if __name__ == '__main__': | |
main() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<DIF xmlns="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/ http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/dif_v9.7.1.xsd"> | |
<Entry_ID>noaa-tree-3089</Entry_ID> | |
<Entry_Title> | |
Dean - Navajo National Monument - PSME - ITRDB AZ023 | |
</Entry_Title> | |
<Data_Set_Citation> | |
<Dataset_Creator>Dean, J.S.</Dataset_Creator> | |
<Dataset_Title> | |
Dean - Navajo National Monument - PSME - ITRDB AZ023 | |
</Dataset_Title> | |
<Dataset_Publisher>NCDC-Paleoclimatology</Dataset_Publisher> | |
<Data_Presentation_Form>Online Files</Data_Presentation_Form> | |
<Online_Resource> | |
http://hurricane.ncdc.noaa.gov/pls/paleox/f?p=519:1:::::P1_STUDY_ID:3089 | |
</Online_Resource> | |
</Data_Set_Citation> | |
<Personnel> | |
<Role>Investigator</Role> | |
<First_Name>J.S.</First_Name> | |
<Last_Name>Dean</Last_Name> | |
</Personnel> | |
<Parameters> | |
<Category>earth science</Category> | |
<Topic>paleoclimate</Topic> | |
<Term>tree-ring</Term> | |
<Variable_Level_1>tree species</Variable_Level_1> | |
<Detailed_Variable>Pseudotsuga menziesii (Mirb.) Franco</Detailed_Variable> | |
</Parameters> | |
<Parameters> | |
<Category>earth science</Category> | |
<Topic>paleoclimate</Topic> | |
<Term>tree-ring</Term> | |
<Variable_Level_1>width</Variable_Level_1> | |
<Detailed_Variable>ring width</Detailed_Variable> | |
</Parameters> | |
<Parameters> | |
<Category>earth science</Category> | |
<Topic>paleoclimate</Topic> | |
<Term>tree-ring</Term> | |
<Variable_Level_1>width</Variable_Level_1> | |
<Detailed_Variable>more info</Detailed_Variable> | |
</Parameters> | |
<ISO_Topic_Category>Geoscientific Information</ISO_Topic_Category> | |
<Keyword> | |
earth science>paleoclimate>tree-ring>tree species>Pseudotsuga menziesii (Mirb.) Franco | |
</Keyword> | |
<Keyword> | |
earth science>paleoclimate>tree-ring>width>ring width | |
</Keyword> | |
<Keyword> | |
earth science>paleoclimate>tree-ring>width>more info | |
</Keyword> | |
<Keyword> | |
SPECIES>PSME>Pseudotsuga menziesii (Mirb.) Franco>Douglas-fir | |
</Keyword> | |
<Paleo_Temporal_Coverage> | |
<Paleo_Start_Date>1304 AD</Paleo_Start_Date> | |
<Paleo_Stop_Date>1962 AD</Paleo_Stop_Date> | |
</Paleo_Temporal_Coverage> | |
<Paleo_Temporal_Coverage> | |
<Paleo_Start_Date>646 cal yr BP</Paleo_Start_Date> | |
<Paleo_Stop_Date>-12 cal yr BP</Paleo_Stop_Date> | |
</Paleo_Temporal_Coverage> | |
<Data_Set_Progress>Complete</Data_Set_Progress> | |
<Spatial_Coverage> | |
<Southernmost_Latitude>36.67</Southernmost_Latitude> | |
<Northernmost_Latitude>36.67</Northernmost_Latitude> | |
<Westernmost_Longitude>-110.5</Westernmost_Longitude> | |
<Easternmost_Longitude>-110.5</Easternmost_Longitude> | |
<Minimum_Altitude>2012</Minimum_Altitude> | |
<Maximum_Altitude>2012</Maximum_Altitude> | |
</Spatial_Coverage> | |
<Location> | |
<Location_Category>Continent</Location_Category> | |
<Location_Type>North America</Location_Type> | |
<Location_Subregion1>United States Of America</Location_Subregion1> | |
<Location_Subregion2>Arizona</Location_Subregion2> | |
<Detailed_Location> | |
Navajo National Monument>LATITUDE 36.67>LONGITUDE -110.5 | |
</Detailed_Location> | |
</Location> | |
<Access_Constraints>None</Access_Constraints> | |
<Use_Constraints> | |
Please cite original references when using this data. | |
</Use_Constraints> | |
<Data_Set_Language>English</Data_Set_Language> | |
<Data_Center> | |
<Data_Center_Name> | |
<Short_Name>DOC/NOAA/NESDIS/NCDC</Short_Name> | |
<Long_Name> | |
National Climatic Data Center, NESDIS, NOAA, U.S. Department of Commerce | |
</Long_Name> | |
</Data_Center_Name> | |
<Data_Center_URL>http://www.ncdc.noaa.gov/paleo/</Data_Center_URL> | |
<Personnel> | |
<Role>Data Center Contact</Role> | |
<First_Name>Bruce</First_Name> | |
<Last_Name>Bauer</Last_Name> | |
<Email>[email protected]</Email> | |
<Email>[email protected]</Email> | |
<Phone>303-497-6280</Phone> | |
<Fax>303-497-6513</Fax> | |
<Contact_Address> | |
<Address>325 Broadway, E/CC23</Address> | |
<City>Boulder</City> | |
<Province_or_State>CO</Province_or_State> | |
<Postal_Code>80305</Postal_Code> | |
<Country>USA</Country> | |
</Contact_Address> | |
</Personnel> | |
</Data_Center> | |
<Distribution> | |
<Distribution_Media>online</Distribution_Media> | |
<Distribution_Format>ASCII</Distribution_Format> | |
</Distribution> | |
<Reference/> | |
<Summary> | |
Records of past temperature, precipitation, and climate and environmental change derived from tree ring measurements. Parameter keywords describe what was measured in this data set. Additional summary information can be found in the abstracts of papers listed in the data set citations, however many of the data sets arise from unpublished research contributed to the International Tree Ring Data Bank. Additional information on data processing and analysis for International Tree Ring Data Bank (ITRDB) data sets can be found on the Tree Ring Page (http://www.ncdc.noaa.gov/paleo/treering.html). | |
</Summary> | |
<Related_URL> | |
<URL_Content_Type> | |
<Type>GET DATA</Type> | |
</URL_Content_Type> | |
<URL> | |
ftp://ftp.ncdc.noaa.gov/pub/data/paleo/treering/measurements/northamerica/usa/az023.rwl | |
</URL> | |
<URL> | |
ftp://ftp.ncdc.noaa.gov/pub/data/paleo/treering/chronologies/northamerica/usa/az023.crn | |
</URL> | |
<URL> | |
ftp://ftp.ncdc.noaa.gov/pub/data/paleo/treering/measurements/correlation-stats/az023.txt | |
</URL> | |
</Related_URL> | |
<IDN_Node> | |
<Short_Name>USA/NOAA</Short_Name> | |
</IDN_Node> | |
<Metadata_Name>DIF</Metadata_Name> | |
<Metadata_Version>Version 9.7.1</Metadata_Version> | |
<DIF_Creation_Date>2009-02-19</DIF_Creation_Date> | |
<Last_DIF_Revision_Date>2009-02-19</Last_DIF_Revision_Date> | |
</DIF> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment