Skip to content

Instantly share code, notes, and snippets.

@cpcloud
cpcloud / trunc_repr_bug
Last active August 29, 2015 14:01
trunc repr bug
{
"metadata": {
"name": "",
"signature": "sha256:1428f1fc4d07eeb621c648b728c7feb5f0210656024476c109e86964cb744ea9"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"metadata": {
"name": "",
"signature": "sha256:1428f1fc4d07eeb621c648b728c7feb5f0210656024476c109e86964cb744ea9"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
@cpcloud
cpcloud / repr-issue.json
Created May 20, 2014 21:06
notebook repr issue
{
"metadata": {
"name": "",
"signature": "sha256:1efaafda434deed4c4582e9da622b8c45fff909cb71c32dbc68a4422b72d3e43"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
@cpcloud
cpcloud / nate_image.ipynb
Created June 21, 2014 19:40
hack cell image segmentation
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@cpcloud
cpcloud / pysource_check_readline
Last active August 29, 2015 14:05
Build Python 2.7.8 from source
wget https://www.python.org/ftp/python/2.7.8/Python-2.7.8.tar.xz
tar xvf Python-2.7.8.tar.xz
cd Python-2.7.8
./configure --enable-shared --enable-ipv6 --enable-unicode=ucs4 --prefix=/usr
make -j `nproc`
find -name 'readline.so'
# currently:
diamonds[(diamonds.cut == 'Ideal') | (diamonds.cut == 'Premium')][['cut', 'price']].sort('price', ascending=False).head(10)
# ideally:
diamonds[diamonds.cut.isin(['Ideal', 'Premium'])][['cut', 'price']].sort('price', ascending=False).head(10)
@cpcloud
cpcloud / scipy-fu.md
Last active August 29, 2015 14:17
blaze + odo SciPy 2015 abstract

Blaze + Odo: Shapeshifting on fire

Brief Desciption

Blaze separates expressions from computation. Odo moves complex data resources from point A to point B. Together they smooth over many of the complexities of computing with large data warehouse technologies like Redshift, Impala and HDFS. These libraries we designed with PyData in mind and so they play well with pandas, numpy, and a host of other foundational libraries. We show examples of each in action and discuss the design behind each library.

Blaze

Blaze lets us write down abstract expressions and then run those expressions against a data source. This approach lets users separate computation from data so that the details of the data source's API are mostly hidden. Additionally, blaze is pluggable. This lets users easily write backends for blaze. This allows other communities to hook in to the PyData ecosystem. Blaze is also well-integrated with other PyData projects such as numba. We discuss the design of blaze, show off a few backends and sh

@cpcloud
cpcloud / daskit.py
Last active August 29, 2015 14:26
Do vs Bag + Do
#!/usr/bin/env python
"""
Dask version of
https://hdfgroup.org/wp/2015/04/putting-some-spark-into-hdf-eos/
"""
from __future__ import print_function, division
import os
@cpcloud
cpcloud / arrow-build.md
Last active March 13, 2024 20:29
Arrow build instructions

Building arrow, parquet-cpp, and pyarrow

Prerequisites

  • conda
  • Boost (>= 1.54)
  • A recent-ish C/C++ compiler (4.9?)

Create a Conda environment

In [19]: df = pd.DataFrame({'a':[1,2,3],'b':[1.0,None,3.0]}, index=list('abc'))
In [20]: t = pa.Table.from_pandas(df)
In [21]: t.column(2).to_pandas()
Out[21]:
0 a
1 b
2 c
Name: _index_level_0, dtype: object