Skip to content

Instantly share code, notes, and snippets.

@jamesp
Created May 15, 2018 10:03
Show Gist options
  • Save jamesp/8248397f66b798c5676d1f14564ea715 to your computer and use it in GitHub Desktop.
Save jamesp/8248397f66b798c5676d1f14564ea715 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"\n",
"import xarray as xr\n",
"import numpy as np\n",
"\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Numpy Arrays\n",
"\n",
"Numpy arrays are a very efficient way of working with multi-dimensional data in Python. \n",
"* They allow us to perform vectorised operations very efficiently.\n",
"* The code looks the same whether we are working on a single value, or an entire list."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Python lists are slow!!!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"xs = range(1000000)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"square_xs = []\n",
"for x in xs:\n",
" square_xs.append(x**2)\n",
"\n",
"None"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%time [x**2 for x in xs]\n",
"None"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nxs = np.asarray(xs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"nxs**2\n",
"None"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"numpy arrays can be multi-dimensional. For example, we can create a 3-D array (x, y, and time) and fill it with data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# create a dataset that varies in time and space\n",
"# choose a sample spacing. Remember the array will have a memory size of nt*nx*ny*nbytes(float64)\n",
"nt = 100\n",
"nx = 256\n",
"ny = 128"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# locations of the samples\n",
"t = np.arange(1, nt+1)\n",
"x = np.linspace(0, np.pi*2, nx, endpoint=False)\n",
"y = np.linspace(-1, 1, ny)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# create empty dimensions to make coordinate arrays easier to work with\n",
"tt = t[:, np.newaxis, np.newaxis]\n",
"yy = y[np.newaxis, :, np.newaxis]\n",
"xx = x[np.newaxis, np.newaxis, :]\n",
"xx.shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = np.exp(-(yy/.3)**2)*np.cos(3*(xx)) + np.random.normal(size=(nt,ny,nx))\n",
"print(\"Size of data: %.1f MB\" % (data.nbytes/1e6))\n",
"print(\"Shape of data: %r\" % (data.shape,))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data[0,0,0:3]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# show the field at time=1\n",
"plt.imshow(data[0, :, :])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# mean of the whole field\n",
"data.mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# find the mean in time\n",
"plt.plot(data.mean(axis=(1,2)))\n",
"plt.xlabel('time')\n",
"plt.ylabel('mean val')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Surely there must be a better way?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"d = xr.DataArray(data, coords=(('time', t), ('y', y), ('x', x)))\n",
"d"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"d.sel(time=1).plot()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"d.mean('time').plot()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Chaining operations\n",
"\n",
"Typically most operations will return a new DataArray, so they can be chained together to select and transform your data.\n",
"\n",
"The order you perform the operations may or may not matter, but typically it is better to do selection operations earlier, to reduce the amount of calculation you are doing."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"dmean = d.mean('time')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"d.sel(x=3.14, method='nearest').mean('time').plot()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All three of these methods are equivalent. Arithmetic operations on xarray DataArrays return DataArrays, so operations can be chained.\n",
"\n",
"The `pipe` method is also very useful. It expects a function that takes your data object, manipulates it, and returns a new DataArray."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fig, (ax1, ax2, ax3) = plt.subplots(ncols=3, figsize=(9, 3))\n",
"\n",
"np.sqrt(d**2).mean(('time', 'x')).plot(ax=ax1)\n",
"\n",
"d.pipe(abs).mean(('time', 'x')).plot(ax=ax2)\n",
"\n",
"d.pipe(lambda x: np.sqrt(x**2)).mean(('time', 'x')).plot(ax=ax3)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"d.mean('time').pipe(abs).mean('x').plot()\n",
"(np.exp(-(d.y/.3)**2)*2/np.pi).plot(linestyle='--')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# DataSets\n",
"\n",
"DataSets are collections of DataArrays with shared coordinates."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds = xr.open_dataset('/scratch/jp492/gfdl_data/dry_earth/run0034/daily.nc', decode_times=False)\n",
"ds"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds.t_surf.mean('time').plot.contourf(levels=22)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds.lon"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds.ucomp.mean(('time', 'lon')).plot.contourf(levels=22)\n",
"plt.ylim(ds.pfull.max(), ds.pfull.min())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Subsetting a dataset:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"atmosphere = ds[['ucomp', 'vcomp', 'temp']]\n",
"atmosphere\n",
"\n",
"# can also use ds.drop(['a', 'b']) to remove specific DataArrays"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As for fields, we can perform reduction operations on the whole dataset at one time."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds.mean()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ds.mean([dim for dim in ds.coords if dim != 'time']).temp.plot()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment