Skip to content

Instantly share code, notes, and snippets.

@brews
Last active August 29, 2015 14:13
Show Gist options
  • Save brews/e042df945f909d4960a6 to your computer and use it in GitHub Desktop.
Save brews/e042df945f909d4960a6 to your computer and use it in GitHub Desktop.
A very simple script to parse an ITRDB metadata DIF XML files and put select metadata into a tab-delimited file.
#! /usr/bin/env python3
# 2015-01-22
# Copyright 2015 S. Brewster Malevich <[email protected]>
# Parse a directory's ITRDB .xml files. The target directory should be the first
# argument when calling this script. The default is to use the current working
# directory. This script extract select data from each XML file and writes the
# metadata to a tab-delimited file, `OUTFILE_NAME`.
#
# When the XML files are in the current working directory with this script, run:
# python3 itrdb_javelina.py
# To specify a directory with XML files:
# python3 itrdb_javelina.py \path\to\xml\directory\
# If your path has a space, you need to escape it with `/`. For example:
# python3 itrdb_javelina.py \path\to/ xml/ directory\
# uses the directory "\path\to xml directory\".
import sys
import os
import glob
import xml.etree.ElementTree as ET
FOO = "{http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/}"
OUTFILE_NAME = "xmlmeta.tsv"
def main():
target_dir = os.getcwd()
try:
target_dir = sys.argv[1]
except IndexError:
pass
out = []
header = "uid\tcreator\tspp\tstartdate\tstopdate\tslat\tnlat\twlon\telon\tminalt\tmaxalt\turl\n"
out.append(header)
for fl in glob.iglob(target_dir + "/*.xml"):
tree = ET.parse(fl)
root = tree.getroot()
uid = root.findtext(FOO + "Entry_ID")
creator = root.findtext(FOO + "Data_Set_Citation/" + FOO + "Dataset_Creator")
for child in root.findall(FOO + "Parameters"):
if child.findtext(FOO + "Variable_Level_1") == "tree species":
spp = child.findtext(FOO + "Detailed_Variable")
for child in root.findall(FOO + "Paleo_Temporal_Coverage/" + FOO + "Paleo_Start_Date"):
if len(child.text.split(" ")) == 2:
startdate = child.text
else:
continue
for child in root.findall(FOO + "Paleo_Temporal_Coverage/" + FOO + "Paleo_Stop_Date"):
if len(child.text.split(" ")) == 2:
stopdate = child.text
else:
continue
slat = root.findtext(FOO + "Spatial_Coverage/" + FOO + "Southernmost_Latitude")
nlat = root.findtext(FOO + "Spatial_Coverage/" + FOO + "Northernmost_Latitude")
wlon = root.findtext(FOO + "Spatial_Coverage/" + FOO + "Westernmost_Longitude")
elon = root.findtext(FOO + "Spatial_Coverage/" + FOO + "Easternmost_Longitude")
minalt = root.findtext(FOO + "Spatial_Coverage/" + FOO + "Minimum_Altitude")
maxalt = root.findtext(FOO + "Spatial_Coverage/" + FOO + "Maximum_Altitude")
for child in root.findall(FOO + "Related_URL/" + FOO + "URL"):
url = child.text.strip("\n")
ln = [uid, creator, spp, startdate, stopdate, slat, nlat, wlon, elon, minalt, maxalt, url]
out.append("\t".join(ln) + "\n")
with open(target_dir + "/" + OUTFILE_NAME, "w") as fl:
fl.writelines(out)
if __name__ == '__main__':
main()
<DIF xmlns="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/ http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/dif_v9.7.1.xsd">
<Entry_ID>noaa-tree-3089</Entry_ID>
<Entry_Title>
Dean - Navajo National Monument - PSME - ITRDB AZ023
</Entry_Title>
<Data_Set_Citation>
<Dataset_Creator>Dean, J.S.</Dataset_Creator>
<Dataset_Title>
Dean - Navajo National Monument - PSME - ITRDB AZ023
</Dataset_Title>
<Dataset_Publisher>NCDC-Paleoclimatology</Dataset_Publisher>
<Data_Presentation_Form>Online Files</Data_Presentation_Form>
<Online_Resource>
http://hurricane.ncdc.noaa.gov/pls/paleox/f?p=519:1:::::P1_STUDY_ID:3089
</Online_Resource>
</Data_Set_Citation>
<Personnel>
<Role>Investigator</Role>
<First_Name>J.S.</First_Name>
<Last_Name>Dean</Last_Name>
</Personnel>
<Parameters>
<Category>earth science</Category>
<Topic>paleoclimate</Topic>
<Term>tree-ring</Term>
<Variable_Level_1>tree species</Variable_Level_1>
<Detailed_Variable>Pseudotsuga menziesii (Mirb.) Franco</Detailed_Variable>
</Parameters>
<Parameters>
<Category>earth science</Category>
<Topic>paleoclimate</Topic>
<Term>tree-ring</Term>
<Variable_Level_1>width</Variable_Level_1>
<Detailed_Variable>ring width</Detailed_Variable>
</Parameters>
<Parameters>
<Category>earth science</Category>
<Topic>paleoclimate</Topic>
<Term>tree-ring</Term>
<Variable_Level_1>width</Variable_Level_1>
<Detailed_Variable>more info</Detailed_Variable>
</Parameters>
<ISO_Topic_Category>Geoscientific Information</ISO_Topic_Category>
<Keyword>
earth science>paleoclimate>tree-ring>tree species>Pseudotsuga menziesii (Mirb.) Franco
</Keyword>
<Keyword>
earth science>paleoclimate>tree-ring>width>ring width
</Keyword>
<Keyword>
earth science>paleoclimate>tree-ring>width>more info
</Keyword>
<Keyword>
SPECIES>PSME>Pseudotsuga menziesii (Mirb.) Franco>Douglas-fir
</Keyword>
<Paleo_Temporal_Coverage>
<Paleo_Start_Date>1304 AD</Paleo_Start_Date>
<Paleo_Stop_Date>1962 AD</Paleo_Stop_Date>
</Paleo_Temporal_Coverage>
<Paleo_Temporal_Coverage>
<Paleo_Start_Date>646 cal yr BP</Paleo_Start_Date>
<Paleo_Stop_Date>-12 cal yr BP</Paleo_Stop_Date>
</Paleo_Temporal_Coverage>
<Data_Set_Progress>Complete</Data_Set_Progress>
<Spatial_Coverage>
<Southernmost_Latitude>36.67</Southernmost_Latitude>
<Northernmost_Latitude>36.67</Northernmost_Latitude>
<Westernmost_Longitude>-110.5</Westernmost_Longitude>
<Easternmost_Longitude>-110.5</Easternmost_Longitude>
<Minimum_Altitude>2012</Minimum_Altitude>
<Maximum_Altitude>2012</Maximum_Altitude>
</Spatial_Coverage>
<Location>
<Location_Category>Continent</Location_Category>
<Location_Type>North America</Location_Type>
<Location_Subregion1>United States Of America</Location_Subregion1>
<Location_Subregion2>Arizona</Location_Subregion2>
<Detailed_Location>
Navajo National Monument>LATITUDE 36.67>LONGITUDE -110.5
</Detailed_Location>
</Location>
<Access_Constraints>None</Access_Constraints>
<Use_Constraints>
Please cite original references when using this data.
</Use_Constraints>
<Data_Set_Language>English</Data_Set_Language>
<Data_Center>
<Data_Center_Name>
<Short_Name>DOC/NOAA/NESDIS/NCDC</Short_Name>
<Long_Name>
National Climatic Data Center, NESDIS, NOAA, U.S. Department of Commerce
</Long_Name>
</Data_Center_Name>
<Data_Center_URL>http://www.ncdc.noaa.gov/paleo/</Data_Center_URL>
<Personnel>
<Role>Data Center Contact</Role>
<First_Name>Bruce</First_Name>
<Last_Name>Bauer</Last_Name>
<Email>[email protected]</Email>
<Email>[email protected]</Email>
<Phone>303-497-6280</Phone>
<Fax>303-497-6513</Fax>
<Contact_Address>
<Address>325 Broadway, E/CC23</Address>
<City>Boulder</City>
<Province_or_State>CO</Province_or_State>
<Postal_Code>80305</Postal_Code>
<Country>USA</Country>
</Contact_Address>
</Personnel>
</Data_Center>
<Distribution>
<Distribution_Media>online</Distribution_Media>
<Distribution_Format>ASCII</Distribution_Format>
</Distribution>
<Reference/>
<Summary>
Records of past temperature, precipitation, and climate and environmental change derived from tree ring measurements. Parameter keywords describe what was measured in this data set. Additional summary information can be found in the abstracts of papers listed in the data set citations, however many of the data sets arise from unpublished research contributed to the International Tree Ring Data Bank. Additional information on data processing and analysis for International Tree Ring Data Bank (ITRDB) data sets can be found on the Tree Ring Page (http://www.ncdc.noaa.gov/paleo/treering.html).
</Summary>
<Related_URL>
<URL_Content_Type>
<Type>GET DATA</Type>
</URL_Content_Type>
<URL>
ftp://ftp.ncdc.noaa.gov/pub/data/paleo/treering/measurements/northamerica/usa/az023.rwl
</URL>
<URL>
ftp://ftp.ncdc.noaa.gov/pub/data/paleo/treering/chronologies/northamerica/usa/az023.crn
</URL>
<URL>
ftp://ftp.ncdc.noaa.gov/pub/data/paleo/treering/measurements/correlation-stats/az023.txt
</URL>
</Related_URL>
<IDN_Node>
<Short_Name>USA/NOAA</Short_Name>
</IDN_Node>
<Metadata_Name>DIF</Metadata_Name>
<Metadata_Version>Version 9.7.1</Metadata_Version>
<DIF_Creation_Date>2009-02-19</DIF_Creation_Date>
<Last_DIF_Revision_Date>2009-02-19</Last_DIF_Revision_Date>
</DIF>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment