Last active
May 30, 2023 20:07
-
-
Save rly/d09ee3940e97dc7d13c41f48157c2465 to your computer and use it in GitHub Desktop.
Notebook demonstrating how to trim data carefully from an NWB file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Removing data from an NWB file\n", | |
"\n", | |
"## Introduction\n", | |
"\n", | |
"Removing data from an NWB file is generally NOT recommended. It must be done carefully so as to maintain the validity\n", | |
"and internal consistency of a file. Think of it like data surgery. To remove some data from an NWB object, you may \n", | |
"need to create a clone of that NWB object with data removed, and you may need to disconnect NWB objects that you \n", | |
"do not intend to remove and then reconnect them with appropriate links and references to your cloned NWB object\n", | |
"with data removed. This procedure should be done with manual inspection at every step. Please consider generating\n", | |
"a new NWB file from scratch instead of removing data from an NWB file.\n", | |
"\n", | |
"Due to how the HDF5 storage format works, opening an NWB file in append mode and deleting or removing data from the file\n", | |
"will, in general, not physically delete the data from the file. It just removes the data from the index. As a result,\n", | |
"the file size will not be reduced.\n", | |
"\n", | |
"If you want to permanently delete data from the file and reduce the file size, you must create a new file based on the\n", | |
"original file. You can do this in several ways.\n", | |
"\n", | |
"1. Open the NWB file in PyNWB, modify the `NWBFile` object or its children objects in memory, and then \n", | |
" [export](https://pynwb.readthedocs.io/en/stable/export.html) the modified `NWBFile` object to a new file path. We will\n", | |
" work through that approach below.\n", | |
"\n", | |
"2. Remove (unlink) data from the file using `h5py` or other HDF5 libraries, and then use the \n", | |
" [h5repack](https://portal.hdfgroup.org/display/HDF5/h5repack) tool from the HDF5 library to create a new file without\n", | |
" the unlinked data. This method has a key disadvantage in that there are fewer checks that the modified file is \n", | |
" a valid NWB file. This method will not be described here.\n", | |
"\n", | |
"## Download a test NWB file\n", | |
"\n", | |
"First, download an example NWB file from DANDI. We will use a 1.7 GB NWB file from the Credit Assignment project, a\n", | |
"part of the Allen Institute for Brain Science's OpenScope project. In your browser:\n", | |
"\n", | |
"1. Go to https://dandiarchive.org/dandiset/000037 \n", | |
"2. In the top-right of the page, click Files\n", | |
"3. Click the first folder \"sub-408021\"\n", | |
"4. Click the download icon for the first file \"sub-408021_ses-758519303_behavior+image+ophys.nwb\". It may take some\n", | |
" time to download this file, depending on your internet connection.\n", | |
"\n", | |
"## Install PyNWB\n", | |
"\n", | |
"Then, you will need to install the latest version of PyNWB from PyPI or conda-forge,\n", | |
"ideally in a clean virtual environment. Here, let's use `mamba` to create a new conda environment called \"nwb-remove\"\n", | |
"with `jupyterlab` installed so that we can run this notebook. Then let's use `pip` to install `pynwb` from PyPI.\n", | |
"\n", | |
"In a terminal or command prompt with [mambaforge](https://github.com/conda-forge/miniforge#mambaforge) installed, run:\n", | |
"\n", | |
"```bash\n", | |
"mamba create --name nwb-remove --yes jupyterlab\n", | |
"mamba activate nwb-remove\n", | |
"pip install pynwb\n", | |
"```\n", | |
"\n", | |
"PyNWB is now installed, so we can use it to open and manipulate the file.\n", | |
"\n", | |
"## Open the NWB file\n", | |
"\n", | |
"Open this notebook, change the `file_path` variable in the cell below to be the location of your downloaded\n", | |
"NWB file, and run the cell to read the file into an `NWBFile` object in memory." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"root pynwb.file.NWBFile at 0x6067191056\n", | |
"Fields:\n", | |
" devices: {\n", | |
" 2p_microscope <class 'pynwb.device.Device'>\n", | |
" }\n", | |
" file_create_date: [datetime.datetime(2022, 9, 25, 4, 53, 2, 714938, tzinfo=tzoffset(None, -25200))]\n", | |
" identifier: 758519303_with_stim\n", | |
" imaging_planes: {\n", | |
" ImagingPlane <class 'pynwb.ophys.ImagingPlane'>\n", | |
" }\n", | |
" institution: Allen Institute for Brain Science\n", | |
" intervals: {\n", | |
" trials <class 'pynwb.epoch.TimeIntervals'>\n", | |
" }\n", | |
" processing: {\n", | |
" behavior <class 'pynwb.base.ProcessingModule'>,\n", | |
" ophys <class 'pynwb.base.ProcessingModule'>\n", | |
" }\n", | |
" session_description: Allen Institute OpenScope dataset\n", | |
" session_id: 758519303\n", | |
" session_start_time: 2018-09-26 17:29:17.502000-07:00\n", | |
" stimulus: {\n", | |
" gabors <class 'pynwb.image.IndexSeries'>,\n", | |
" grayscreen <class 'pynwb.image.IndexSeries'>,\n", | |
" visflow_left <class 'pynwb.image.IndexSeries'>,\n", | |
" visflow_right <class 'pynwb.image.IndexSeries'>\n", | |
" }\n", | |
" stimulus_template: {\n", | |
" gabors <class 'pynwb.image.ImageSeries'>,\n", | |
" grayscreen <class 'pynwb.image.ImageSeries'>,\n", | |
" visflow_left <class 'pynwb.image.ImageSeries'>,\n", | |
" visflow_right <class 'pynwb.image.ImageSeries'>\n", | |
" }\n", | |
" subject: subject pynwb.file.Subject at 0x6067083088\n", | |
"Fields:\n", | |
" age: P95D\n", | |
" genotype: Cux2-CreERT2;Camk2a-tTA;Ai93\n", | |
" sex: M\n", | |
" species: Mus musculus\n", | |
" subject_id: 408021\n", | |
"\n", | |
" timestamps_reference_time: 2018-09-26 17:29:17.502000-07:00\n", | |
" trials: trials <class 'pynwb.epoch.TimeIntervals'>\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"from pynwb import NWBHDF5IO\n", | |
"\n", | |
"# CHANGE THIS PATH to the location of the downloaded test NWB file\n", | |
"file_path = \"/Users/rly/Documents/NWB_Data/dandisets/000037/sub-408021/sub-408021_ses-758519303_behavior+image+ophys.nwb\"\n", | |
"\n", | |
"read_io = NWBHDF5IO(file_path, mode=\"r\")\n", | |
"nwbfile = read_io.read()\n", | |
"print(nwbfile)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The NWB file contains a processing module called \"ophys\". Within that is a `DfOverF` object named \"DfOverF\". And\n", | |
"within that is an `RoiResponseSeries` object named \"RoiResponseSeries\". This object contains the dF/F traces for each\n", | |
"region of interest (ROI)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"RoiResponseSeries pynwb.ophys.RoiResponseSeries at 0x6061288080\n", | |
"Fields:\n", | |
" comments: no comments\n", | |
" conversion: 1.0\n", | |
" data: <HDF5 dataset \"data\": shape (126741, 96), type \"<f8\">\n", | |
" description: ROI traces\n", | |
" interval: 1\n", | |
" offset: 0.0\n", | |
" resolution: -1.0\n", | |
" rois: rois <class 'hdmf.common.table.DynamicTableRegion'>\n", | |
" timestamp_link: (\n", | |
" pupil_diameter <class 'pynwb.base.TimeSeries'>\n", | |
" )\n", | |
" timestamps: <HDF5 dataset \"timestamps\": shape (126741,), type \"<f8\">\n", | |
" timestamps_unit: seconds\n", | |
" unit: Normalized fluorescence (A.U.)\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"roi_response_series = nwbfile.processing[\"ophys\"][\"DfOverF\"][\"RoiResponseSeries\"]\n", | |
"print(roi_response_series)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can see that the `data` field of the `RoiResponseSeries` object has shape `(126741, 96)`, which means there are\n", | |
"126,741 samples and 96 ROIs. The `timestamps` field has shape `(126741, )`, which means there is one timestamp for\n", | |
"each sample in the `data` field, and the two fields are correctly aligned.\n", | |
"\n", | |
"We can also see that there is a `timestamp_link` value, which means that the timestamps of the `TimeSeries` object \n", | |
"named \"pupil_diameter\" elsewhere in the file links to this `timestamps` field rather than defining its own `timestamps`\n", | |
"field (this saves on disk space when the timestamps are the same). This linkage means that any changes we make \n", | |
"to the `timestamps` field of the `RoiResponseSeries` object will impact the `timestamps` field of the \n", | |
"\"pupil_diameter\" `TimeSeries` object." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Modify the NWB file: Trim a dataset\n", | |
"\n", | |
"In general, removing data in an NWB file is NOT recommended. It must be done carefully so as to maintain the validity\n", | |
"and internal consistency of a file. For example, using the approaches described here, you might accidentally create \n", | |
"a `TimeSeries` object where the number of samples in the `data` field does not equal the number of elements in the\n", | |
"`timestamps` field. Or you might create a table where the columns have different numbers of rows. Or you might create an\n", | |
"object that is missing a reference to another object or refers to an object in the original file instead of in the\n", | |
"new file. \n", | |
"\n", | |
"We will walk through an example of how to remove data from the NWB file carefully.\n", | |
"\n", | |
"First, we will trim the `data` and `timestamps` fields of the above `RoiResponseSeries` object\n", | |
"so that they represent only 1000 samples in time instead of 126,741 samples.\n", | |
"\n", | |
"By default, datasets in NWB are not resizable. To be resizable, the dataset must have been created with chunked storage\n", | |
"and an appropriate non-None value for the `maxshape` argument. \n", | |
"See https://docs.h5py.org/en/stable/high/dataset.html#resizable-datasets for more information. If this argument is set, \n", | |
"then you can trim a dataset by simply calling `resize` on the dataset with a new, smaller shape, e.g.,\n", | |
"```python\n", | |
"roi_response_series.data.resize((1000, 96))\n", | |
"```\n", | |
"Note that this will not reduce the file size unless `h5repack` is used on the NWB file.\n", | |
"\n", | |
"Because most datasets in NWB are not resizable, we will have to take a different, less convenient approach. We will\n", | |
"have to:\n", | |
"1. Remove the original `RoiResponseSeries` object from its parent container, the `DfOverF` object.\n", | |
"2. Create a new `RoiResponseSeries` object with a subset of the data from the original `RoiResponseSeries` object and\n", | |
" keep everything else the same.\n", | |
"3. Add the new `RoiResponseSeries` object to that `DfOverF` object." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Children: (rois hdmf.common.table.DynamicTableRegion at 0x6045457872\n", | |
" Target table: PlaneSegmentation pynwb.ophys.PlaneSegmentation at 0x6045309456\n", | |
",)\n" | |
] | |
} | |
], | |
"source": [ | |
"# first, pop the RoiResponseSeries object from the \"roi_response_series\" dictionary\n", | |
"# field in the DfOverF object\n", | |
"dfoverf = nwbfile.processing[\"ophys\"][\"DfOverF\"]\n", | |
"dfoverf.roi_response_series.pop(roi_response_series.name)\n", | |
"\n", | |
"# to fully unlink the RoiResponseSeries object from all other objects in the file,\n", | |
"# reset the parent property to be None\n", | |
"roi_response_series.reset_parent()\n", | |
"\n", | |
"# also reset the parent property for any child objects of the RoiResponseSeries.\n", | |
"# in this case, there is a DynamicTableRegion that links the RoiResponseSeries object\n", | |
"# with rows of a PlaneSegmentationTable elsewhere in the file.\n", | |
"print(\"Children:\", roi_response_series.children)\n", | |
"for child in roi_response_series.children:\n", | |
" child.reset_parent()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"If you wanted simply to remove the `RoiResponseSeries` object, you could skip the next cell." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"RoiResponseSeries pynwb.ophys.RoiResponseSeries at 0x6065409232\n", | |
"Fields:\n", | |
" comments: no comments\n", | |
" conversion: 1.0\n", | |
" data: [[ 0.22540055 0.04358731 0.15931772 ... 0.14367128 0.11825969\n", | |
" 0.24960337]\n", | |
" [ 0.20273186 0.10390094 0.13995919 ... 0.29989742 0.\n", | |
" 0.26306016]\n", | |
" [ 0.31186248 -0.00094157 0.19112528 ... 0.18533189 0.06014981\n", | |
" 0.22809416]\n", | |
" ...\n", | |
" [-0.04699584 0.0452316 0.36256709 ... 0.27613118 0.06358456\n", | |
" 0.00733844]\n", | |
" [-0.01746068 0.03006853 0.2434425 ... 0.29808228 0.08803363\n", | |
" -0.01777625]\n", | |
" [ 0.03073724 0.0248613 0.29369005 ... 0.07373342 -0.02438294\n", | |
" 0.07851377]]\n", | |
" description: ROI traces. This data has been trimmed.\n", | |
" interval: 1\n", | |
" offset: 0.0\n", | |
" resolution: -1.0\n", | |
" rois: rois <class 'hdmf.common.table.DynamicTableRegion'>\n", | |
" timestamps: [ 8.79461 8.82786 8.8611 8.89435 8.92759 8.96084 8.99408 9.02733\n", | |
" 9.06057 9.09382 9.12706 9.16031 9.19355 9.2268 9.26004 9.29329\n", | |
" 9.32653 9.35978 9.39302 9.42627 9.45951 9.49276 9.526 9.55925\n", | |
" 9.59249 9.62574 9.65898 9.69223 9.72547 9.75872 9.79196 9.82521\n", | |
" 9.85845 9.8917 9.92494 9.95819 9.99143 10.02468 10.05792 10.09117\n", | |
" 10.12441 10.15766 10.1909 10.22415 10.25739 10.29064 10.32388 10.35713\n", | |
" 10.39037 10.42362 10.45686 10.49011 10.52335 10.5566 10.58984 10.62309\n", | |
" 10.65633 10.68958 10.72282 10.75607 10.78931 10.82256 10.8558 10.88905\n", | |
" 10.92229 10.95554 10.98878 11.02203 11.05527 11.08852 11.12176 11.15501\n", | |
" 11.18825 11.2215 11.25474 11.28799 11.32123 11.35448 11.38772 11.42096\n", | |
" 11.45421 11.48745 11.5207 11.55394 11.58719 11.62043 11.65368 11.68692\n", | |
" 11.72017 11.75341 11.78666 11.8199 11.85315 11.88639 11.91964 11.95288\n", | |
" 11.98613 12.01937 12.05262 12.08586 12.11911 12.15235 12.1856 12.21884\n", | |
" 12.25209 12.28533 12.31858 12.35182 12.38507 12.41831 12.45156 12.4848\n", | |
" 12.51805 12.55129 12.58454 12.61778 12.65103 12.68427 12.71752 12.75076\n", | |
" 12.78401 12.81725 12.8505 12.88374 12.91699 12.95023 12.98348 13.01672\n", | |
" 13.04997 13.08321 13.11646 13.1497 13.18295 13.21619 13.24944 13.28268\n", | |
" 13.31593 13.34917 13.38242 13.41566 13.44891 13.48215 13.5154 13.54864\n", | |
" 13.58189 13.61513 13.64838 13.68162 13.71487 13.74811 13.78136 13.8146\n", | |
" 13.84785 13.88109 13.91434 13.94758 13.98083 14.01407 14.04732 14.08056\n", | |
" 14.11381 14.14705 14.1803 14.21354 14.24679 14.28003 14.31328 14.34652\n", | |
" 14.37977 14.41301 14.44626 14.4795 14.51275 14.54599 14.57924 14.61248\n", | |
" 14.64573 14.67897 14.71222 14.74546 14.77871 14.81195 14.8452 14.87844\n", | |
" 14.91169 14.94493 14.97818 15.01142 15.04467 15.07791 15.11116 15.1444\n", | |
" 15.17765 15.21089 15.24414 15.27738 15.31063 15.34387 15.37712 15.41036\n", | |
" 15.44361 15.47685 15.5101 15.54334 15.57659 15.60983 15.64307 15.67632\n", | |
" 15.70956 15.74281 15.77605 15.8093 15.84254 15.87579 15.90903 15.94228\n", | |
" 15.97552 16.00877 16.04201 16.07526 16.1085 16.14175 16.17499 16.20824\n", | |
" 16.24148 16.27473 16.30797 16.34122 16.37446 16.40771 16.44095 16.4742\n", | |
" 16.50744 16.54069 16.57393 16.60718 16.64042 16.67367 16.70691 16.74016\n", | |
" 16.7734 16.80665 16.83989 16.87314 16.90638 16.93963 16.97287 17.00612\n", | |
" 17.03936 17.07261 17.10585 17.1391 17.17234 17.20559 17.23883 17.27208\n", | |
" 17.30532 17.33857 17.37181 17.40506 17.4383 17.47155 17.50479 17.53804\n", | |
" 17.57128 17.60453 17.63777 17.67102 17.70426 17.73751 17.77075 17.804\n", | |
" 17.83724 17.87049 17.90373 17.93698 17.97022 18.00347 18.03671 18.06996\n", | |
" 18.1032 18.13645 18.16969 18.20294 18.23618 18.26943 18.30267 18.33592\n", | |
" 18.36916 18.40241 18.43565 18.4689 18.50214 18.53539 18.56863 18.60188\n", | |
" 18.63512 18.66837 18.70161 18.73486 18.7681 18.80135 18.83459 18.86784\n", | |
" 18.90108 18.93433 18.96757 19.00082 19.03406 19.06731 19.10055 19.1338\n", | |
" 19.16704 19.20029 19.23354 19.26678 19.30003 19.33327 19.36652 19.39976\n", | |
" 19.43301 19.46625 19.4995 19.53274 19.56599 19.59923 19.63248 19.66572\n", | |
" 19.69897 19.73221 19.76546 19.7987 19.83195 19.86519 19.89844 19.93168\n", | |
" 19.96493 19.99817 20.03142 20.06466 20.09791 20.13115 20.1644 20.19764\n", | |
" 20.23089 20.26413 20.29738 20.33062 20.36387 20.39711 20.43036 20.4636\n", | |
" 20.49685 20.53009 20.56334 20.59658 20.62983 20.66307 20.69632 20.72956\n", | |
" 20.76281 20.79605 20.8293 20.86254 20.89579 20.92903 20.96228 20.99552\n", | |
" 21.02877 21.06201 21.09526 21.1285 21.16175 21.19499 21.22824 21.26148\n", | |
" 21.29473 21.32797 21.36122 21.39446 21.42771 21.46095 21.4942 21.52744\n", | |
" 21.56069 21.59393 21.62718 21.66042 21.69367 21.72691 21.76016 21.7934\n", | |
" 21.82665 21.85989 21.89314 21.92638 21.95963 21.99287 22.02612 22.05936\n", | |
" 22.09261 22.12585 22.1591 22.19234 22.22559 22.25883 22.29208 22.32532\n", | |
" 22.35857 22.39181 22.42506 22.4583 22.49155 22.52479 22.55804 22.59128\n", | |
" 22.62453 22.65777 22.69102 22.72426 22.75751 22.79075 22.824 22.85724\n", | |
" 22.89049 22.92373 22.95698 22.99022 23.02347 23.05671 23.08996 23.12321\n", | |
" 23.15645 23.1897 23.22294 23.25619 23.28943 23.32268 23.35592 23.38917\n", | |
" 23.42241 23.45566 23.4889 23.52215 23.55539 23.58864 23.62188 23.65513\n", | |
" 23.68837 23.72162 23.75486 23.78811 23.82135 23.8546 23.88784 23.92109\n", | |
" 23.95433 23.98758 24.02082 24.05407 24.08731 24.12056 24.1538 24.18705\n", | |
" 24.22029 24.25354 24.28678 24.32003 24.35327 24.38652 24.41976 24.45301\n", | |
" 24.48625 24.5195 24.55274 24.58599 24.61923 24.65248 24.68572 24.71897\n", | |
" 24.75221 24.78546 24.8187 24.85195 24.88519 24.91844 24.95168 24.98493\n", | |
" 25.01817 25.05142 25.08466 25.11791 25.15115 25.1844 25.21764 25.25089\n", | |
" 25.28414 25.31738 25.35063 25.38387 25.41712 25.45036 25.48361 25.51685\n", | |
" 25.5501 25.58334 25.61659 25.64983 25.68308 25.71632 25.74957 25.78281\n", | |
" 25.81606 25.8493 25.88255 25.91579 25.94904 25.98228 26.01553 26.04877\n", | |
" 26.08202 26.11526 26.14851 26.18175 26.215 26.24824 26.28149 26.31473\n", | |
" 26.34798 26.38122 26.41447 26.44771 26.48096 26.5142 26.54745 26.58069\n", | |
" 26.61394 26.64718 26.68043 26.71367 26.74692 26.78016 26.81341 26.84665\n", | |
" 26.8799 26.91314 26.94639 26.97964 27.01288 27.04613 27.07937 27.11262\n", | |
" 27.14586 27.17911 27.21235 27.2456 27.27884 27.31209 27.34533 27.37858\n", | |
" 27.41182 27.44507 27.47831 27.51156 27.5448 27.57805 27.61129 27.64454\n", | |
" 27.67778 27.71103 27.74427 27.77752 27.81076 27.84401 27.87725 27.9105\n", | |
" 27.94374 27.97699 28.01023 28.04348 28.07672 28.10997 28.14321 28.17646\n", | |
" 28.2097 28.24295 28.27619 28.30944 28.34268 28.37593 28.40917 28.44242\n", | |
" 28.47567 28.50891 28.54216 28.5754 28.60865 28.64189 28.67514 28.70838\n", | |
" 28.74163 28.77487 28.80812 28.84136 28.87461 28.90785 28.9411 28.97434\n", | |
" 29.00759 29.04083 29.07408 29.10732 29.14057 29.17381 29.20706 29.2403\n", | |
" 29.27355 29.30679 29.34004 29.37328 29.40653 29.43977 29.47302 29.50626\n", | |
" 29.53951 29.57275 29.606 29.63924 29.67249 29.70573 29.73898 29.77223\n", | |
" 29.80547 29.83872 29.87196 29.90521 29.93845 29.9717 30.00494 30.03819\n", | |
" 30.07143 30.10468 30.13792 30.17117 30.20441 30.23766 30.2709 30.30415\n", | |
" 30.33739 30.37064 30.40388 30.43713 30.47037 30.50362 30.53686 30.57011\n", | |
" 30.60335 30.6366 30.66984 30.70309 30.73633 30.76958 30.80282 30.83607\n", | |
" 30.86931 30.90256 30.93581 30.96905 31.0023 31.03554 31.06879 31.10203\n", | |
" 31.13528 31.16852 31.20177 31.23501 31.26826 31.3015 31.33475 31.36799\n", | |
" 31.40124 31.43448 31.46773 31.50097 31.53422 31.56746 31.60071 31.63395\n", | |
" 31.6672 31.70044 31.73369 31.76693 31.80018 31.83342 31.86667 31.89991\n", | |
" 31.93316 31.9664 31.99965 32.0329 32.06614 32.09939 32.13263 32.16588\n", | |
" 32.19912 32.23237 32.26561 32.29886 32.3321 32.36535 32.39859 32.43184\n", | |
" 32.46508 32.49833 32.53157 32.56482 32.59806 32.63131 32.66455 32.6978\n", | |
" 32.73104 32.76429 32.79753 32.83078 32.86402 32.89727 32.93051 32.96376\n", | |
" 32.99701 33.03025 33.0635 33.09674 33.12999 33.16323 33.19648 33.22972\n", | |
" 33.26297 33.29621 33.32946 33.3627 33.39595 33.42919 33.46244 33.49568\n", | |
" 33.52893 33.56217 33.59542 33.62866 33.66191 33.69515 33.7284 33.76164\n", | |
" 33.79489 33.82813 33.86138 33.89462 33.92787 33.96112 33.99436 34.02761\n", | |
" 34.06085 34.0941 34.12734 34.16059 34.19383 34.22708 34.26032 34.29357\n", | |
" 34.32681 34.36006 34.3933 34.42655 34.45979 34.49304 34.52628 34.55953\n", | |
" 34.59277 34.62602 34.65926 34.69251 34.72575 34.759 34.79224 34.82549\n", | |
" 34.85874 34.89198 34.92523 34.95847 34.99172 35.02496 35.05821 35.09145\n", | |
" 35.1247 35.15794 35.19119 35.22443 35.25768 35.29092 35.32417 35.35741\n", | |
" 35.39066 35.4239 35.45715 35.49039 35.52364 35.55688 35.59013 35.62337\n", | |
" 35.65662 35.68986 35.72311 35.75636 35.7896 35.82285 35.85609 35.88934\n", | |
" 35.92258 35.95583 35.98907 36.02232 36.05556 36.08881 36.12205 36.1553\n", | |
" 36.18854 36.22179 36.25503 36.28828 36.32152 36.35477 36.38801 36.42126\n", | |
" 36.4545 36.48775 36.52099 36.55424 36.58749 36.62073 36.65398 36.68722\n", | |
" 36.72047 36.75371 36.78696 36.8202 36.85345 36.88669 36.91994 36.95318\n", | |
" 36.98643 37.01967 37.05292 37.08616 37.11941 37.15265 37.1859 37.21914\n", | |
" 37.25239 37.28563 37.31888 37.35212 37.38537 37.41862 37.45186 37.48511\n", | |
" 37.51835 37.5516 37.58484 37.61809 37.65133 37.68458 37.71782 37.75107\n", | |
" 37.78431 37.81756 37.8508 37.88405 37.91729 37.95054 37.98378 38.01703\n", | |
" 38.05027 38.08352 38.11676 38.15001 38.18326 38.2165 38.24975 38.28299\n", | |
" 38.31624 38.34948 38.38273 38.41597 38.44922 38.48246 38.51571 38.54895\n", | |
" 38.5822 38.61544 38.64869 38.68193 38.71518 38.74842 38.78167 38.81491\n", | |
" 38.84816 38.8814 38.91465 38.9479 38.98114 39.01439 39.04763 39.08088\n", | |
" 39.11412 39.14737 39.18061 39.21386 39.2471 39.28035 39.31359 39.34684\n", | |
" 39.38008 39.41333 39.44657 39.47982 39.51306 39.54631 39.57955 39.6128\n", | |
" 39.64605 39.67929 39.71254 39.74578 39.77903 39.81227 39.84552 39.87876\n", | |
" 39.91201 39.94525 39.9785 40.01174 40.04499 40.07823 40.11148 40.14472\n", | |
" 40.17797 40.21121 40.24446 40.2777 40.31095 40.3442 40.37744 40.41069\n", | |
" 40.44393 40.47718 40.51042 40.54367 40.57691 40.61016 40.6434 40.67665\n", | |
" 40.70989 40.74314 40.77638 40.80963 40.84287 40.87612 40.90936 40.94261\n", | |
" 40.97585 41.0091 41.04235 41.07559 41.10884 41.14208 41.17533 41.20857\n", | |
" 41.24182 41.27506 41.30831 41.34155 41.3748 41.40804 41.44129 41.47453\n", | |
" 41.50778 41.54102 41.57427 41.60751 41.64076 41.674 41.70725 41.7405\n", | |
" 41.77374 41.80699 41.84023 41.87348 41.90672 41.93997 41.97321 42.00646]\n", | |
" timestamps_unit: seconds\n", | |
" unit: Normalized fluorescence (A.U.)" | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"from pynwb.ophys import RoiResponseSeries\n", | |
"\n", | |
"max_samples = 1000\n", | |
"\n", | |
"# next, create a new RoiResponseSeries object with only 1000 samples in time, a modified description,\n", | |
"# and all other fields the same.\n", | |
"# you have to list all the fields that you want to copy.\n", | |
"new_series = RoiResponseSeries(\n", | |
" name=roi_response_series.name,\n", | |
" description=roi_response_series.description + \". This data has been trimmed.\",\n", | |
" unit=roi_response_series.unit,\n", | |
" rois=roi_response_series.rois,\n", | |
" data=roi_response_series.data[0:max_samples,:], # only copy the first max_samples samples\n", | |
" timestamps=roi_response_series.timestamps[0:max_samples], # only copy the first max_samples samples\n", | |
")\n", | |
"\n", | |
"dfoverf.add_roi_response_series(new_series)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Important: Other objects in the NWB file or in a different NWB file might link or reference the `RoiResponseSeries` \n", | |
"being modified here. It is currently not possible to find all such links and references. You will have to validate\n", | |
"the new NWB file and manually inspect it to ensure that all data are valid and correct.\n", | |
"\n", | |
"The `timestamps_link` field described earlier contains other objects that link to the `timestamps` field of\n", | |
"this object and need to be updated. We can either maintain that link, in which case, the `data` field of the\n", | |
"linking objects must also be reduced to 1000 samples so that the `data` field and `timestamps` field have the same\n", | |
"number of samples, or we can replace the link with a copy of the original values with all 126,741 samples. \n", | |
"Below, we will replace the `timestamps` field with a copy of the original timestamps, following the three\n", | |
"steps described above: remove, create, and add." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"{pupil_diameter pynwb.base.TimeSeries at 0x6057947728\n", | |
"Fields:\n", | |
" comments: no comments\n", | |
" conversion: 1.0\n", | |
" data: <HDF5 dataset \"data\": shape (126741,), type \"<f8\">\n", | |
" description: Diameter of the mouse pupil (right) facing the stimulus presentation screen.\n", | |
" interval: 1\n", | |
" offset: 0.0\n", | |
" resolution: -1.0\n", | |
" timestamps: RoiResponseSeries pynwb.ophys.RoiResponseSeries at 0x6061288080\n", | |
"Fields:\n", | |
" comments: no comments\n", | |
" conversion: 1.0\n", | |
" data: <HDF5 dataset \"data\": shape (126741, 96), type \"<f8\">\n", | |
" description: ROI traces\n", | |
" interval: 1\n", | |
" offset: 0.0\n", | |
" resolution: -1.0\n", | |
" rois: rois <class 'hdmf.common.table.DynamicTableRegion'>\n", | |
" timestamp_link: (\n", | |
" pupil_diameter <class 'pynwb.base.TimeSeries'>\n", | |
" )\n", | |
" timestamps: <HDF5 dataset \"timestamps\": shape (126741,), type \"<f8\">\n", | |
" timestamps_unit: seconds\n", | |
" unit: Normalized fluorescence (A.U.)\n", | |
"\n", | |
" timestamps_unit: seconds\n", | |
" unit: mm\n", | |
"}\n", | |
"PupilTracking pynwb.behavior.PupilTracking at 0x6065983760\n", | |
"Fields:\n", | |
" time_series: {\n", | |
" pupil_diameter <class 'pynwb.base.TimeSeries'>\n", | |
" }\n", | |
"\n", | |
"Children: ()\n" | |
] | |
} | |
], | |
"source": [ | |
"print(roi_response_series.timestamp_link)\n", | |
"# note: the below code assumes that you know that the object being modified is a TimeSeries\n", | |
"\n", | |
"pupil_diameter_series = list(roi_response_series.timestamp_link)[0]\n", | |
"pd_series_parent = pupil_diameter_series.parent\n", | |
"print(pd_series_parent)\n", | |
"\n", | |
"# remove the pupil_diameter TimeSeries from its parent container and fully unlink it\n", | |
"pd_series_parent.time_series.pop(pupil_diameter_series.name)\n", | |
"pupil_diameter_series.reset_parent()\n", | |
"\n", | |
"# reset the parent property of any children\n", | |
"print(\"Children:\", pupil_diameter_series.children)\n", | |
"for child in pupil_diameter_series.children:\n", | |
" child.reset_parent()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"pupil_diameter pynwb.base.TimeSeries at 0x6065555536\n", | |
"Fields:\n", | |
" comments: no comments\n", | |
" conversion: 1.0\n", | |
" data: <HDF5 dataset \"data\": shape (126741,), type \"<f8\">\n", | |
" description: Diameter of the mouse pupil (right) facing the stimulus presentation screen.\n", | |
" interval: 1\n", | |
" offset: 0.0\n", | |
" resolution: -1.0\n", | |
" timestamps: [ 8.79461 8.82786 8.8611 ... 4222.35324 4222.38648 4222.41973]\n", | |
" timestamps_unit: seconds\n", | |
" unit: mm" | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"from pynwb import TimeSeries\n", | |
"\n", | |
"# create a new TimeSeries object with a copy of the original data\n", | |
"# by calling pupil_diameter_series.timestamps[:], the timestamps values\n", | |
"# are loaded into a numpy array in memory and are no longer linked to\n", | |
"# a field from another time series.\n", | |
"new_pupil_diameter_series = TimeSeries(\n", | |
" name=pupil_diameter_series.name,\n", | |
" data=pupil_diameter_series.data,\n", | |
" timestamps=pupil_diameter_series.timestamps[:],\n", | |
" description=pupil_diameter_series.description,\n", | |
" unit=pupil_diameter_series.unit,\n", | |
")\n", | |
"\n", | |
"# add the new \"pupil_diameter\" TimeSeries object to the parent of the original\n", | |
"# \"pupil_diameter\" TimeSeries object\n", | |
"pd_series_parent.add_timeseries(new_pupil_diameter_series)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# finally, create a new IO object for exporting the modified in-memory NWBFile object\n", | |
"# to a new file path\n", | |
"export_path = \"./trimmed_nwbfile.nwb\"\n", | |
"with NWBHDF5IO(export_path, mode=\"w\") as export_io:\n", | |
" export_io.export(src_io=read_io, nwbfile=nwbfile)\n", | |
"\n", | |
"read_io.close()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's verify that our new file is valid, contains a trimmed version of the `RoiResponseSeries`,\n", | |
"and a version of the \"pupil_diameter\" `TimeSeries` with the full timestamps values without links." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Validation errors: []\n", | |
"RoiResponseSeries pynwb.ophys.RoiResponseSeries at 0x6067188368\n", | |
"Fields:\n", | |
" comments: no comments\n", | |
" conversion: 1.0\n", | |
" data: <HDF5 dataset \"data\": shape (1000, 96), type \"<f8\">\n", | |
" description: ROI traces. This data has been trimmed.\n", | |
" interval: 1\n", | |
" offset: 0.0\n", | |
" resolution: -1.0\n", | |
" rois: rois <class 'hdmf.common.table.DynamicTableRegion'>\n", | |
" timestamps: <HDF5 dataset \"timestamps\": shape (1000,), type \"<f8\">\n", | |
" timestamps_unit: seconds\n", | |
" unit: Normalized fluorescence (A.U.)\n", | |
"\n", | |
"pupil_diameter pynwb.base.TimeSeries at 0x6127889232\n", | |
"Fields:\n", | |
" comments: no comments\n", | |
" conversion: 1.0\n", | |
" data: <HDF5 dataset \"data\": shape (126741,), type \"<f8\">\n", | |
" description: Diameter of the mouse pupil (right) facing the stimulus presentation screen.\n", | |
" interval: 1\n", | |
" offset: 0.0\n", | |
" resolution: -1.0\n", | |
" timestamps: <HDF5 dataset \"timestamps\": shape (126741,), type \"<f8\">\n", | |
" timestamps_unit: seconds\n", | |
" unit: mm\n", | |
"\n", | |
"<HDF5 file \"trimmed_nwbfile.nwb\" (mode r)>\n" | |
] | |
} | |
], | |
"source": [ | |
"from pynwb import validate\n", | |
"\n", | |
"with NWBHDF5IO(export_path, mode=\"r\") as read_export_io:\n", | |
" errors = validate(read_export_io)\n", | |
" print(\"Validation errors:\", errors) # this should be an empty list\n", | |
"\n", | |
" read_nwbfile = read_export_io.read()\n", | |
" read_roi_response_series = read_nwbfile.processing[\"ophys\"][\"DfOverF\"][\"RoiResponseSeries\"]\n", | |
" print(read_roi_response_series)\n", | |
" # the \"data\" and \"timestamps\" fields should have 1000 values in the first dimension\n", | |
" # and the description should end with \"This data has been trimmed.\"\n", | |
" read_pupil_diameter_series = read_nwbfile.processing[\"behavior\"][\"PupilTracking\"][\"pupil_diameter\"]\n", | |
" print(read_pupil_diameter_series)\n", | |
" print(read_pupil_diameter_series.timestamps.file)\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Conclusion\n", | |
"\n", | |
"Again, removing data from an NWB file in this manner is generally NOT recommended, and if done, should be done\n", | |
"carefully. Think of it like data surgery. \n", | |
"\n", | |
"The NWB dev team understands that the methods described here are far from an elegant solution, and the team\n", | |
"is considering ways to improve this workflow.\n", | |
"\n", | |
"Please contact the NWB team by email, slack, or on the\n", | |
"[help desk](https://github.com/NeurodataWithoutBorders/helpdesk/discussions) if you have any questions." | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3 (ipykernel)", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.10.9" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment