neuromusic · August 24, 2017 18:43
diff --git a/xarray+neurophysiology.ipynb b/xarray+neurophysiology.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# how does xarray handle multiple data arrays with different timebases?\n",
    "\n",
    "Often in a neurophysiology experiment, we are sampling from different datastreams simultaneously... for example, we may start sampling frames of calcium activity from a 2P microscope (at 31fps), then start acquisition of some behavioral feature on a different piece of hardware (say, running speed at 50Hz) a few seconds later.\n",
    "\n",
    "There's a technical challenge in syncronizing the datastreams across multiple pieces of hardware, but if we assume that is solved, then we may want to subselect our entire data stream around a given time (say, a stimulus event that it under the experimenter's control).\n",
    "\n",
    "xarray is a python package that lets us annotate multidimensional arrays with labels for each axis... both the name of the axis (the 'dim', such as 'time') and the exact coordinates (e.g. [1.0,2.0,3.0]). Further, these DataArrays can be combined into a DataSet, then sliced along any shared dimensions. While primarily used in the geophysical sciences, it's promising python package for this neurophysiology case & I'm going to see how it does."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import xarray as xr\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## create some physiology data\n",
    "\n",
    "First, let's create a synthetic calcium signal. We'll assume that we've isolated 100 neurons and extraced their calcium traces from the recorded images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray (neuron: 100, time: 1860)>\n",
       "array([[ 0.510455,  0.337288,  0.536991, ...,  0.532675,  0.951841,  0.211775],\n",
       "       [ 0.410921,  0.332649,  0.055494, ...,  0.727138,  0.016597,  0.001816],\n",
       "       [ 0.95599 ,  0.12458 ,  0.548215, ...,  0.376174,  0.225282,  0.25057 ],\n",
       "       ..., \n",
       "       [ 0.482431,  0.236054,  0.890402, ...,  0.811114,  0.171165,  0.325449],\n",
       "       [ 0.210746,  0.365866,  0.58027 , ...,  0.256656,  0.939333,  0.202481],\n",
       "       [ 0.706207,  0.328495,  0.854625, ...,  0.743763,  0.052923,  0.903367]])\n",
       "Coordinates:\n",
       "  * neuron   (neuron) <U6 'roi000' 'roi001' 'roi002' 'roi003' 'roi004' ...\n",
       "  * time     (time) float64 0.0 0.03226 0.06452 0.09677 0.129 0.1613 0.1935 ..."
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ca_Hz = 31.0\n",
    "\n",
    "start_time = 0.0\n",
    "sampling_time = 60.0 # seconds\n",
    "\n",
    "n_neurons = 100\n",
    "\n",
    "\n",
    "calcium = xr.DataArray(\n",
    "    np.random.random((n_neurons,int(sampling_time * ca_Hz))),\n",
    "    dims=('neuron','time'),\n",
    "    coords={\n",
    "        'time':start_time + np.arange(0,sampling_time * ca_Hz)/ca_Hz,\n",
    "        'neuron': ['roi{0:03d}'.format(n) for n in range(n_neurons)]\n",
    "    }\n",
    ")\n",
    "calcium"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## a second simultaneous recording\n",
    "\n",
    "Next, we'll create a running signal. This signal started later, didn't sample as long, and sampled at a different rate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray (time: 2500)>\n",
       "array([ 0.210562,  0.248769,  0.091969, ...,  0.441855,  0.148435,  0.899181])\n",
       "Coordinates:\n",
       "  * time     (time) float64 1.0 1.02 1.04 1.06 1.08 1.1 1.12 1.14 1.16 1.18 ..."
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "running_Hz = 50\n",
    "start_time = 1.0\n",
    "sampling_time = 50.0\n",
    "\n",
    "running =  xr.DataArray(\n",
    "    np.random.random(int(sampling_time*running_Hz)),\n",
    "    dims=('time'),\n",
    "    coords={'time':start_time + np.arange(0,sampling_time*running_Hz)/running_Hz}\n",
    ")\n",
    "running"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## make a single Dataset out of our two DataArrays\n",
    "\n",
    "OK, we have two DataArrays. Both have a dimension called 'time'. Let's combine these into a single DataSet & see what happens."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.Dataset>\n",
       "Dimensions:  (neuron: 100, time: 4310)\n",
       "Coordinates:\n",
       "  * time     (time) float64 0.0 0.03226 0.06452 0.09677 0.129 0.1613 0.1935 ...\n",
       "  * neuron   (neuron) <U6 'roi000' 'roi001' 'roi002' 'roi003' 'roi004' ...\n",
       "Data variables:\n",
       "    running  (time) float64 nan nan nan nan nan nan nan nan nan nan nan nan ...\n",
       "    calcium  (neuron, time) float64 0.5105 0.3373 0.537 0.6395 0.2549 ..."
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "experiment = xr.Dataset(data_vars={'calcium': calcium,'running':running})\n",
    "experiment"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check that out!!\n",
    "\n",
    "xarray discovered the shared dimension & created a single 'time' coordinate that both arrays are now referenced against. For any times where one DataArray was missing values while the other one had values, it filled it the array with NaNs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'running' (time: 4310)>\n",
       "array([ nan,  nan,  nan, ...,  nan,  nan,  nan])\n",
       "Coordinates:\n",
       "  * time     (time) float64 0.0 0.03226 0.06452 0.09677 0.129 0.1613 0.1935 ..."
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "experiment['running']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## get our original data back out\n",
    "\n",
    "We can get back to our original `running` array, by grabbing the DataArray from the Dataset then calling `.dropna()`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.DataArray 'running' (time: 2500)>\n",
       "array([ 0.210562,  0.248769,  0.091969, ...,  0.441855,  0.148435,  0.899181])\n",
       "Coordinates:\n",
       "  * time     (time) float64 1.0 1.02 1.04 1.06 1.08 1.1 1.12 1.14 1.16 1.18 ..."
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "experiment['running'].dropna(dim='time')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## slice the whole xr.Dataset\n",
    "\n",
    "We can then slice the entire Dataset based on the time array, passing `drop=True` to there `.where` method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<xarray.Dataset>\n",
       "Dimensions:  (neuron: 100, time: 79)\n",
       "Coordinates:\n",
       "  * time     (time) float64 3.02 3.032 3.04 3.06 3.065 3.08 3.097 3.1 3.12 ...\n",
       "  * neuron   (neuron) <U6 'roi000' 'roi001' 'roi002' 'roi003' 'roi004' ...\n",
       "Data variables:\n",
       "    running  (time) float64 0.2339 nan 0.9739 0.1332 nan 0.6031 nan 0.8324 ...\n",
       "    calcium  (neuron, time) float64 nan 0.3882 nan nan 0.4332 nan 0.08029 ..."
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "boolean_time_mask = (experiment['time']>3) & (experiment['time']<4)\n",
    "\n",
    "experiment.where(boolean_time_mask,drop=True)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# how does xarray handle multiple data arrays with different timebases?\n",
	"\n",
	"Often in a neurophysiology experiment, we are sampling from different datastreams simultaneously... for example, we may start sampling frames of calcium activity from a 2P microscope (at 31fps), then start acquisition of some behavioral feature on a different piece of hardware (say, running speed at 50Hz) a few seconds later.\n",
	"\n",
	"There's a technical challenge in syncronizing the datastreams across multiple pieces of hardware, but if we assume that is solved, then we may want to subselect our entire data stream around a given time (say, a stimulus event that it under the experimenter's control).\n",
	"\n",
	"xarray is a python package that lets us annotate multidimensional arrays with labels for each axis... both the name of the axis (the 'dim', such as 'time') and the exact coordinates (e.g. [1.0,2.0,3.0]). Further, these DataArrays can be combined into a DataSet, then sliced along any shared dimensions. While primarily used in the geophysical sciences, it's promising python package for this neurophysiology case & I'm going to see how it does."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 16,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"import xarray as xr\n",
	"import numpy as np"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## create some physiology data\n",
	"\n",
	"First, let's create a synthetic calcium signal. We'll assume that we've isolated 100 neurons and extraced their calcium traces from the recorded images."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 24,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"<xarray.DataArray (neuron: 100, time: 1860)>\n",
	"array([[ 0.510455, 0.337288, 0.536991, ..., 0.532675, 0.951841, 0.211775],\n",
	" [ 0.410921, 0.332649, 0.055494, ..., 0.727138, 0.016597, 0.001816],\n",
	" [ 0.95599 , 0.12458 , 0.548215, ..., 0.376174, 0.225282, 0.25057 ],\n",
	" ..., \n",
	" [ 0.482431, 0.236054, 0.890402, ..., 0.811114, 0.171165, 0.325449],\n",
	" [ 0.210746, 0.365866, 0.58027 , ..., 0.256656, 0.939333, 0.202481],\n",
	" [ 0.706207, 0.328495, 0.854625, ..., 0.743763, 0.052923, 0.903367]])\n",
	"Coordinates:\n",
	" * neuron (neuron) <U6 'roi000' 'roi001' 'roi002' 'roi003' 'roi004' ...\n",
	" * time (time) float64 0.0 0.03226 0.06452 0.09677 0.129 0.1613 0.1935 ..."
	]
	},
	"execution_count": 24,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"ca_Hz = 31.0\n",
	"\n",
	"start_time = 0.0\n",
	"sampling_time = 60.0 # seconds\n",
	"\n",
	"n_neurons = 100\n",
	"\n",
	"\n",
	"calcium = xr.DataArray(\n",
	" np.random.random((n_neurons,int(sampling_time * ca_Hz))),\n",
	" dims=('neuron','time'),\n",
	" coords={\n",
	" 'time':start_time + np.arange(0,sampling_time * ca_Hz)/ca_Hz,\n",
	" 'neuron': ['roi{0:03d}'.format(n) for n in range(n_neurons)]\n",
	" }\n",
	")\n",
	"calcium"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## a second simultaneous recording\n",
	"\n",
	"Next, we'll create a running signal. This signal started later, didn't sample as long, and sampled at a different rate."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 27,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"<xarray.DataArray (time: 2500)>\n",
	"array([ 0.210562, 0.248769, 0.091969, ..., 0.441855, 0.148435, 0.899181])\n",
	"Coordinates:\n",
	" * time (time) float64 1.0 1.02 1.04 1.06 1.08 1.1 1.12 1.14 1.16 1.18 ..."
	]
	},
	"execution_count": 27,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"running_Hz = 50\n",
	"start_time = 1.0\n",
	"sampling_time = 50.0\n",
	"\n",
	"running = xr.DataArray(\n",
	" np.random.random(int(sampling_time*running_Hz)),\n",
	" dims=('time'),\n",
	" coords={'time':start_time + np.arange(0,sampling_time*running_Hz)/running_Hz}\n",
	")\n",
	"running"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## make a single Dataset out of our two DataArrays\n",
	"\n",
	"OK, we have two DataArrays. Both have a dimension called 'time'. Let's combine these into a single DataSet & see what happens."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 28,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"<xarray.Dataset>\n",
	"Dimensions: (neuron: 100, time: 4310)\n",
	"Coordinates:\n",
	" * time (time) float64 0.0 0.03226 0.06452 0.09677 0.129 0.1613 0.1935 ...\n",
	" * neuron (neuron) <U6 'roi000' 'roi001' 'roi002' 'roi003' 'roi004' ...\n",
	"Data variables:\n",
	" running (time) float64 nan nan nan nan nan nan nan nan nan nan nan nan ...\n",
	" calcium (neuron, time) float64 0.5105 0.3373 0.537 0.6395 0.2549 ..."
	]
	},
	"execution_count": 28,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"experiment = xr.Dataset(data_vars={'calcium': calcium,'running':running})\n",
	"experiment"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Check that out!!\n",
	"\n",
	"xarray discovered the shared dimension & created a single 'time' coordinate that both arrays are now referenced against. For any times where one DataArray was missing values while the other one had values, it filled it the array with NaNs."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 38,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"<xarray.DataArray 'running' (time: 4310)>\n",
	"array([ nan, nan, nan, ..., nan, nan, nan])\n",
	"Coordinates:\n",
	" * time (time) float64 0.0 0.03226 0.06452 0.09677 0.129 0.1613 0.1935 ..."
	]
	},
	"execution_count": 38,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"experiment['running']"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## get our original data back out\n",
	"\n",
	"We can get back to our original `running` array, by grabbing the DataArray from the Dataset then calling `.dropna()`"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 40,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"<xarray.DataArray 'running' (time: 2500)>\n",
	"array([ 0.210562, 0.248769, 0.091969, ..., 0.441855, 0.148435, 0.899181])\n",
	"Coordinates:\n",
	" * time (time) float64 1.0 1.02 1.04 1.06 1.08 1.1 1.12 1.14 1.16 1.18 ..."
	]
	},
	"execution_count": 40,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"experiment['running'].dropna(dim='time')"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## slice the whole xr.Dataset\n",
	"\n",
	"We can then slice the entire Dataset based on the time array, passing `drop=True` to there `.where` method"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 44,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"<xarray.Dataset>\n",
	"Dimensions: (neuron: 100, time: 79)\n",
	"Coordinates:\n",
	" * time (time) float64 3.02 3.032 3.04 3.06 3.065 3.08 3.097 3.1 3.12 ...\n",
	" * neuron (neuron) <U6 'roi000' 'roi001' 'roi002' 'roi003' 'roi004' ...\n",
	"Data variables:\n",
	" running (time) float64 0.2339 nan 0.9739 0.1332 nan 0.6031 nan 0.8324 ...\n",
	" calcium (neuron, time) float64 nan 0.3882 nan nan 0.4332 nan 0.08029 ..."
	]
	},
	"execution_count": 44,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"boolean_time_mask = (experiment['time']>3) & (experiment['time']<4)\n",
	"\n",
	"experiment.where(boolean_time_mask,drop=True)"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.5.3"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}