-
-
Save liyi-1989/8bb558d4cbc33daa65c3 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"name": "", | |
"signature": "sha256:65aa7ff8ea053a7d34e0d7496f676d92274ad6a7602765834e8e441d7a1d2f7b" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Working with HDF5 files in Python\n", | |
"\n", | |
"## 1. Introduction\n", | |
"\n", | |
"Hierarchical Data Format (HDF) is a set of file formats (HDF4, HDF5) designed to store and organize large amounts of numerical data. It is an open-source library and file format for storing large amounts of numerical data, originally developed at NCSA.\n", | |
"\n", | |
"In python you can use the **h5py** package to edit the HDF5 file. For installation issue, please consult [here](http://docs.h5py.org/en/2.3/build.html). If you are new to python, you can easily install the [Anaconda](http://continuum.io/downloads) and it will contains this package and many more commonly used packages.\n", | |
"\n", | |
"\n", | |
"The HDF5 file is just like a file system that stores data. It has only two kinds of objects, the **group** and the **dataset**. The group is just like the folders in a file system, while the dataset is used to store different types of data, like the NumPy array. \n", | |
"\n", | |
"The data set are saved in the HDF5 file in a way that is similar to the regular file system: `/Folder/SubFolder/DataName`.\n", | |
"\n", | |
"## 2. HDF5 in Python\n", | |
"\n", | |
"Let us assume that we have already installed h5py on your computer. We will see how to work with the h5py module. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"import numpy as np\n", | |
"import h5py" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 1 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We could create a HDF5 file object by using the `h5py.File()` function. We could specify the mode as \"r\"(read) or \"w\"(write). By default, it is \"a\"(read and write)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"myfile = h5py.File(\"ex1.hdf5\")" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 2 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 2.1 Creating groups\n", | |
"\n", | |
"Now, we only create an empty HDF5 file `myfile`. We need to add some elements in it. For example, we could use the `myfile.create_group()` function to create a new group(or \"folder\"). " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"myfile.create_group(\"grp1\")" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 3, | |
"text": [ | |
"<HDF5 group \"/grp1\" (0 members)>" | |
] | |
} | |
], | |
"prompt_number": 3 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"You can also create a group by setting it equals to a variable." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"group2=myfile.create_group(\"grp2\")" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 4 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"For a group object, you could use `keys()` function to get the object(s) name in it." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"myfile.keys()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 5, | |
"text": [ | |
"[u'grp1', u'grp2']" | |
] | |
} | |
], | |
"prompt_number": 5 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Moreover, we could create a subgroup by using the same function for `group2`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"s1=group2.create_group(\"subgroup1\")\n", | |
"group2.keys()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 6, | |
"text": [ | |
"[u'subgroup1']" | |
] | |
} | |
], | |
"prompt_number": 6 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 2.2 Creating data\n", | |
"\n", | |
"Now, it is time to make some data in the group. We could create just like a dictionary in python. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"s1[\"data1\"]=np.arange(0,10)\n", | |
"s1[\"data1\"]" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 7, | |
"text": [ | |
"<HDF5 dataset \"data1\": shape (10,), type \"<i4\">" | |
] | |
} | |
], | |
"prompt_number": 7 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The data created can be viewed with the `.value`. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"s1[\"data1\"].value" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 8, | |
"text": [ | |
"array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" | |
] | |
} | |
], | |
"prompt_number": 8 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Note that the data object can be used in calculation directly. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"np.sum(s1[\"data1\"])" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 9, | |
"text": [ | |
"45" | |
] | |
} | |
], | |
"prompt_number": 9 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"s1[\"data1\"][2]==2" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 10, | |
"text": [ | |
"True" | |
] | |
} | |
], | |
"prompt_number": 10 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Also, we could use the `create_dataset()` fucntion to create a new data set. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"s1.create_dataset(\"data2\",(3,5),np.int)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 11, | |
"text": [ | |
"<HDF5 dataset \"data2\": shape (3, 5), type \"<i4\">" | |
] | |
} | |
], | |
"prompt_number": 11 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"s1[\"data2\"].value" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 12, | |
"text": [ | |
"array([[0, 0, 0, 0, 0],\n", | |
" [0, 0, 0, 0, 0],\n", | |
" [0, 0, 0, 0, 0]])" | |
] | |
} | |
], | |
"prompt_number": 12 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"s1.create_dataset(\"data3\",data=np.arange(15))\n", | |
"s1[\"data3\"].value" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 13, | |
"text": [ | |
"array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])" | |
] | |
} | |
], | |
"prompt_number": 13 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### 2.3 Deleting groups\n", | |
"\n", | |
"You could use the `del` key word to delete a group." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"s1.keys()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 14, | |
"text": [ | |
"[u'data1', u'data2', u'data3']" | |
] | |
} | |
], | |
"prompt_number": 14 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"del s1[\"data3\"]\n", | |
"s1.keys()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 15, | |
"text": [ | |
"[u'data1', u'data2']" | |
] | |
} | |
], | |
"prompt_number": 15 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 3. Save as CSV file\n", | |
"\n", | |
"If you want to save the data set in the HDF5 file as the csv file, you could use the **csv** package in python. For example, we create a 5 by 5 matrix under `s1`. And then, we could use the `csv.writer()` and `.writerows()` to edit the csv file. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"import csv\n", | |
"\n", | |
"s1[\"data4\"]=np.random.rand(5,5)\n", | |
"\n", | |
"csvfile = file('csv_test.csv', 'wb')\n", | |
"writer = csv.writer(csvfile)\n", | |
"writer.writerows(s1[\"data4\"])\n", | |
"\n", | |
"csvfile.close()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 16 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"myfile.close()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 17 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 4. Reference\n", | |
"\n", | |
"- [**h5py.org**](http://docs.h5py.org/en/2.3/index.html)\n", | |
"\n", | |
"- [CSV package in Python](https://docs.python.org/2/library/csv.html)" | |
] | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment