Skip to content

Instantly share code, notes, and snippets.

@j08lue
Created June 23, 2020 07:26
Show Gist options
  • Save j08lue/d1babe44e1343014d4c87a0fb97e84ba to your computer and use it in GitHub Desktop.
Save j08lue/d1babe44e1343014d4c87a0fb97e84ba to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# DMI API Tutorial\n",
"\n",
"This tutorial gives an introduction on how to use the Danish Meteorological Institute's (DMI) API to download historical weather data. The API documentation can be found [here](https://confluence.govcloud.dk/display/FDAPI).\n",
"\n",
"The tutorial uses the Python programming language and is in the format of a Jupyter Notebook. The notebook can be downloaded and run locally, allowing new users to download data immediately. \n",
"\n",
"## Part 1: Retrieving data\n",
"Part 1 of this tutorial will show how to request data and convert it to a table format. Part 2 will deal with how to request specific data and more advanced data handling.\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# Import necessary libraries\n",
"import requests # library for making HTTP requests\n",
"import pandas as pd # library for data analysis\n",
"# following command allows figures to be shown inline\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to access the API it is necessary to create a user and obtain an api-key. This key grants permission to retrieve data and allows DMI to generate usage statistics.\n",
"\n",
"A guide to creating a user profile and getting an api-key can be found [here](https://confluence.govcloud.dk/pages/viewpage.action?pageId=26476690).\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"api_key = '' # insert your own key between the '' signs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An easy test to see if your api-key works is to paste the api url followed by a question mark and then your api-key into your browser, e.g.: https://dmigw.govcloud.dk/metObs/v1/observation?api-key=111111xx-1x11-11xx-1111-1x1111111xx1 (in the example, a API key is used that won't work for you).\n",
"\n",
"This should return a page of data in the browser window.\n",
"<br><br>\n",
"\n",
"In the following code block, data is retrieved using the *requests.get* function. Further information on REST APIs and HTTP request methods can be found [here](https://restfulapi.net/http-methods/).\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<Response [401]> https://dmigw.govcloud.dk/metObs/v1/observation?api-key=\n"
]
}
],
"source": [
"url = 'https://dmigw.govcloud.dk/metObs/v1/observation' # url for the current api version\n",
"r = requests.get(url, params={'api-key': api_key}) # Issues a HTTP GET request\n",
"print(r, r.url)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The [response status code](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes) indicates whether the request was successful or not. A 200 code means that the retrieval was successful. \n",
"<br/><br/>\n",
"\n",
"\n",
"\n",
"\n",
"Next, we extract the JSON file from the returned request object. [JSON](https://restfulapi.net/introduction-to-json/) is a human-readable format for data exchange.\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "unhashable type: 'slice'",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-11-8acd3e84d660>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[0mjson\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mr\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mjson\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;31m# Extract JSON data\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mjson\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;36m2\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;31m# Print the first two data entries\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[1;31mTypeError\u001b[0m: unhashable type: 'slice'"
]
}
],
"source": [
"json = r.json() # Extract JSON data\n",
"print(json[:2]) # Print the first two data entries"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Furthermore, the JSON object can be converted to a convenient table (DataFrame) using the Pandas library."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" _id parameterId stationId \\\n",
"0 120a2e3e-41b3-11ea-969d-c2bb795cdc89 temp_soil 06019 \n",
"1 e972c3fa-41b2-11ea-893e-ae5be27c74ec precip_past1min 06019 \n",
"2 e962392c-41b2-11ea-b423-061e38c0dcb3 wind_speed 06019 \n",
"3 e96237b0-41b2-11ea-b423-061e38c0dcb3 wind_max 06019 \n",
"4 e9623620-41b2-11ea-b423-061e38c0dcb3 wind_dir 06019 \n",
"\n",
" timeCreated timeObserved value \n",
"0 1.580205e+15 1.580205e+15 5.8 \n",
"1 1.580205e+15 1.580205e+15 0.0 \n",
"2 1.580205e+15 1.580205e+15 7.1 \n",
"3 1.580205e+15 1.580205e+15 9.7 \n",
"4 1.580205e+15 1.580205e+15 95.0 \n"
]
}
],
"source": [
"df = pd.DataFrame(json) # Convert JSON object to a Pandas DataFrame\n",
"print(df.head()) # Print the first five rows of the DataFrame"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From the above snippet of data it is possible to deduce that the timestamps are in the format of microseconds since January 1st 1970. Try copying one timestamp and convert it to reable date using [this tool](https://www.epochconverter.com/).\n",
"\n",
"<br/>\n",
"\n",
"\n",
"The timestamps strings can be converted to a datetime object using the Pandas *to_datetime* function."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0 2020-01-28 09:50:00\n",
"1 2020-01-28 09:50:00\n",
"2 2020-01-28 09:50:00\n",
"3 2020-01-28 09:50:00\n",
"4 2020-01-28 09:50:00\n",
"Name: time, dtype: datetime64[ns]\n"
]
}
],
"source": [
"df['time'] = pd.to_datetime(df['timeObserved'], unit='us') # The unit 'us' corresponds to microseconds\n",
"print(df['time'].head()) # Print the first five timestamps"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br/>\n",
"Last, we will generate a list of all the parameters available."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['temp_soil' 'precip_past1min' 'wind_speed' 'wind_max' 'wind_dir'\n",
" 'temp_grass' 'temp_dry' 'temp_dew' 'sun_last10min_glob' 'radia_glob'\n",
" 'precip_past10min' 'humidity' 'precip_dur_past10min'\n",
" 'leav_hum_dur_past10min' 'pressure_at_sea' 'pressure' 'visibility'\n",
" 'visib_mean_last10min' 'cloud_height' 'cloud_cover' 'weather']\n"
]
}
],
"source": [
"parameter_ids = df['parameterId'].unique() # Generate a list of unique parameter ids\n",
"print(parameter_ids) # Print all unique parameter ids"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br/><br/>\n",
"\n",
"## Part 2: Requesting specific data\n",
"\n",
"The above example was a heavily simplied example to illustrate how the API can be accessed. For most applicatios the user wants to specify query criterias, such as:\n",
"1. Meterological stations (e.g. 04320, 06074, etc.)\n",
"2. Parameters (e.g. wind_speed, humidity, etc.)\n",
"3. Time frame (to and from time)\n",
"4. Limit (maximum number of observations)\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<Response [401]> https://dmigw.govcloud.dk/metObs/v1/observation?api-key=&from=1578614400000000&to=1581000566000044&stationId=06188&limit=100000\n"
]
}
],
"source": [
"# Start and end time should be specified in microseconds since January 1st 1970 (Unix time)\n",
"end_time = pd.datetime.today() # End time is defined as the current time\n",
"start_time = pd.datetime(2020,1,10) # Start time is defined as specific date\n",
"\n",
"def datetime_to_unixtime(dt):\n",
" '''Function converting a datetime objects to a Unix microsecond string'''\n",
" return str(int(pd.to_datetime(dt).value*10**-3))\n",
"\n",
"\n",
"# Specify query parameters\n",
"params = {'api-key' : api_key,\n",
" 'from' : datetime_to_unixtime(start_time),\n",
" 'to' : datetime_to_unixtime(end_time),\n",
" 'stationId' : '06188',\n",
" #'parameterId' : 'temp_mean_past1h',\n",
" 'limit' : '100000',\n",
" }\n",
"\n",
"\n",
"r = requests.get(url, params=params) # submit GET request based on url and headers\n",
"print(r, r.url) # Print request status and url"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"N.B.: the parameterId was commented out above, which results in all the parameters being included. As of the time of writing it was only possible to request one or all parameters. The same is true for the stations. The *limit* parameter is the maximum number of observations you want to download, generally it should be set to a large value in order for it not to be limiting.\n",
"\n",
"\n",
"If the request was succesfull, the variable *r* now contains a JSON object with the requested data. Next, the JSON object is extracted and converted to a Pandas DataFrame as previously shown.\n",
"\n",
"A new collumn is created named *time*, which is the observation times in the format of Python *datetime* objects. Also, the unused columns are deleted."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" parameterId stationId value time\n",
"time \n",
"2020-01-28 09:40:00 temp_soil 06188 5.3 2020-01-28 09:40:00\n",
"2020-01-28 09:40:00 wind_speed 06188 3.0 2020-01-28 09:40:00\n",
"2020-01-28 09:40:00 wind_max 06188 5.6 2020-01-28 09:40:00\n",
"2020-01-28 09:40:00 wind_dir 06188 162.0 2020-01-28 09:40:00\n",
"2020-01-28 09:40:00 temp_grass 06188 4.6 2020-01-28 09:40:00\n"
]
}
],
"source": [
"json = r.json() # Extract JSON object\n",
"df = pd.DataFrame(json) # Convert JSON object to a DataFrame\n",
"\n",
"df['time'] = pd.to_datetime(df['timeObserved'], unit='us') # Set the DataFrame index as the observation time\n",
"\n",
"df = df.drop(['_id', 'timeCreated', 'timeObserved'], axis=1) # Delete unused columns\n",
"\n",
"df.index = df['time'] # Set the time as the index\n",
"\n",
"print(df.head()) # Print the first five rows"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since the table includes the multile parameters from station *06188*, it is convenient to format the table such that the index is time and each column represents a unique parameter. A simple method for doing this is to set a multi-index and then unstack, as shown below.\n",
"\n",
"Lastly, the data is visualized."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.5,0,'')"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 864x504 with 3 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df2 = df.set_index(['time', 'parameterId']).drop_duplicates().unstack(level=-1)['value']\n",
"\n",
"params = ['wind_speed', 'humidity', 'temp_dry'] # Chosing which parameters to plot\n",
"\n",
"# Generate plot of data\n",
"ax = df2[params].interpolate().plot(figsize=(12,7), legend=False, fontsize=12, subplots=True)\n",
"ax[0].set_ylabel('Wind speed [m/s]', size=12)\n",
"ax[1].set_ylabel('Humidity [%]', size=12)\n",
"ax[2].set_ylabel('Air temperature [$^\\circ$C]', size=12)\n",
"ax[2].set_xlabel('', size=12)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Useful links:\n",
"1. [Station numbers](https://confluence.govcloud.dk/display/FDAPI/Stations)\n",
"2. [Parameters](https://confluence.govcloud.dk/display/FDAPI/Parameters)\n",
"3. [Codes](https://confluence.govcloud.dk/display/FDAPI/Codes)\n",
"4. [FAQ](https://confluence.govcloud.dk/display/FDAPI/FAQ)\n",
"5. [Terms of use](https://confluence.govcloud.dk/display/FDAPI/Terms+of+Use)\n",
"6. [Operational status](https://confluence.govcloud.dk/display/FDAPI/Operational+Status+of+API)\n",
"7. [API uptime](http://status.govcloud.dk/)\n",
"8. [Contact & support](https://confluence.govcloud.dk/pages/viewpage.action?pageId=26476715)\n",
"9. [User creation](https://confluence.govcloud.dk/pages/viewpage.action?pageId=26476690)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"[The Norwegian Meteorological Institute's historical weather data API](https://frost.met.no/)\n",
"\n",
"[Swedish Meterological Institutes Open Data API](https://opendata.smhi.se/apidocs/metobs/)\n",
"\n",
"*Updated on 27 January 2020 by Adam R. Jensen*"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
@j08lue
Copy link
Author

j08lue commented Jun 23, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment