Created
June 23, 2014 14:56
-
-
Save esc/c84442572872d23c1c3d to your computer and use it in GitHub Desktop.
Blaze Quickstart
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"name": "" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "heading", | |
"level": 1, | |
"metadata": {}, | |
"source": [ | |
"Blaze Example" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This quickstart is here to show some simple ways to get started created\n", | |
"and manipulating Blaze arrays. To run these examples, import blaze as\n", | |
"follows.\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"import blaze\n", | |
"from blaze import array\n", | |
"from datashape import dshape" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 1 | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 2, | |
"metadata": {}, | |
"source": [ | |
"Blaze Arrays" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"To create simple Blaze arrays, you can construct them from nested lists.\n", | |
"Blaze will deduce the dimensionality and data type to use.\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"array(3.14)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 2, | |
"text": [ | |
"array(3.14,\n", | |
" dshape='float64')" | |
] | |
} | |
], | |
"prompt_number": 2 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"array([[1, 2], [3, 4]])" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 3, | |
"text": [ | |
"array([[1, 2],\n", | |
" [3, 4]],\n", | |
" dshape='2 * 2 * int32')" | |
] | |
} | |
], | |
"prompt_number": 3 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"You can override the data type by providing the dshape parameter.\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"array([[1, 2], [3, 4]], dshape='float64')" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 4, | |
"text": [ | |
"array([[ 1., 2.],\n", | |
" [ 3., 4.]],\n", | |
" dshape='2 * 2 * float64')" | |
] | |
} | |
], | |
"prompt_number": 4 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Blaze has a slightly more general data model than NumPy, for example it\n", | |
"supports variable-sized arrays.\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"array([[1], [2, 3, 4], [5, 6]])" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 5, | |
"text": [ | |
"array([[ 1],\n", | |
" [ 2, 3, 4],\n", | |
" [ 5, 6]],\n", | |
" dshape='3 * var * int32')" | |
] | |
} | |
], | |
"prompt_number": 5 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Its support for strings includes variable-sized strings as well.\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"array([['test', 'one', 'two', 'three'], ['a', 'braca', 'dabra']])" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 6, | |
"text": [ | |
"array([[u'test', u'one', u'two', u'three'],\n", | |
" [u'a', u'braca', u'dabra']],\n", | |
" dshape='2 * var * string')" | |
] | |
} | |
], | |
"prompt_number": 6 | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 2, | |
"metadata": {}, | |
"source": [ | |
"Simple Calculations" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Blaze supports ufuncs and arithmetic similarly to NumPy.\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"a = array([1, 2, 3])\n", | |
"blaze.sin(a) + 1" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 7, | |
"text": [ | |
"array([ 1.84147098, 1.90929743, 1.14112001],\n", | |
" dshape='3 * float64')" | |
] | |
} | |
], | |
"prompt_number": 7 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"blaze.sum(3 * a)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 8, | |
"text": [ | |
"array(18,\n", | |
" dshape='int32')" | |
] | |
} | |
], | |
"prompt_number": 8 | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 2, | |
"metadata": {}, | |
"source": [ | |
"Iterators" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Unlike in NumPy, Blaze can construct arrays directly from iterators,\n", | |
"automatically deducing the dimensions and type just like it does for\n", | |
"lists.\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"from blaze import array\n", | |
"alst = [1, 2, 3]\n", | |
"array(alst.__iter__())" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 9, | |
"text": [ | |
"array([1, 2, 3],\n", | |
" dshape='3 * int32')" | |
] | |
} | |
], | |
"prompt_number": 9 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"array([j-i for j in range(1,4)] for i in range(1,4))" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 10, | |
"text": [ | |
"array([[ 0, 1, 2],\n", | |
" [-1, 0, 1],\n", | |
" [-2, -1, 0]],\n", | |
" dshape='3 * 3 * int32')" | |
] | |
} | |
], | |
"prompt_number": 10 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"from random import randrange\n", | |
"array((randrange(10) for i in range(randrange(5))) for j in range(4))" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 11, | |
"text": [ | |
"array([[ 5],\n", | |
" [ 1, 4, 5, 1],\n", | |
" [ 9, 9],\n", | |
" [ 0, 1, 4, 8]],\n", | |
" dshape='4 * var * int32')" | |
] | |
} | |
], | |
"prompt_number": 11 | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 2, | |
"metadata": {}, | |
"source": [ | |
"Disk Backed Array" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Blaze can currently use the BLZ and HDF5 format for storing compressed,\n", | |
"chunked arrays on disk. These can be used through the data descriptors:\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"dd = blaze.BLZ_DDesc('foo.blz', mode='w')\n", | |
"a = blaze.array([[1,2],[3,4]], '2 * 2 * int32', ddesc=dd)\n", | |
"a" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 12, | |
"text": [ | |
"array([[1, 2],\n", | |
" [3, 4]],\n", | |
" dshape='2 * 2 * int32')" | |
] | |
} | |
], | |
"prompt_number": 12 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"So, the dataset is now on disk, stored persistently. Then we can come\n", | |
"later and, in another python session, gain access to it again:\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"dd = blaze.BLZ_DDesc('foo.blz', mode='r')\n", | |
"b = blaze.array(dd)\n", | |
"b" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 13, | |
"text": [ | |
"array([[1, 2],\n", | |
" [3, 4]],\n", | |
" dshape='2 * 2 * int32')" | |
] | |
} | |
], | |
"prompt_number": 13 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"So, we see that we completely recovered the contents of the original\n", | |
"array. Finally, we can get rid of the array completely:\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"dd.remove()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 14 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This will remove the dataset from disk, so it could not be restored in\n", | |
"the future, so if you love your data, be careful with this one.\n", | |
"\n" | |
] | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment