pandas notes for reference

Pandas

Summary

Python Package
Panel Data System
Data Analysis Library
Data Structures:
Series
DataFrame
Panel
Structured on top of numpy
Foundation for real-world data analysis in Python

Functionality

Relational and label-based data management
Compatibilities:
Time Series Data (Ordered, Unordered)
Matrix Data (arbitrary) row and column labels (Homogeneous, Heterogeneous)
Tabular data that contains heterogeneously types columns
Various observational and statistical datasets

R Style Library

R style data handling
Perform fast joins and merges
Read data from various sources
Write data to various formats
Operations:
Handling missing data
Merging and joining datasets
Reshaping and pivoting
Group By engine
Size mutability
Convert index data
Robust I/O tools
Time Series

Execute Test Suite

Execute unit tests to verify pandas is working with $ nosetests pandas
nose extends python unit testing framework

Required Dependencies

setuptools
NumPy (1.7.1 or greater)
pytz (time zone support)
python-dateutil (1.5 or higher)

Recommended Dependencies

numexpr
bottleneck

Optional

SciPy
Cython
matplotlib

Examples

merge performs SQL like joins between data frames
Get help with help('pandas.merge')
Only mandatory parameters are two dataframes to merge (default is inner join)
Merge operations (performs merge on key column below):

import pandas as pd

# create dataframes
df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3']})

df2 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
    'C': ['C0', 'C1', 'C2', 'C3'],
    'D': ['D0', 'D1', 'D2', 'D3']})

# default inner join
pd.merge(df1, df2, on='key')

# changed key sequence of df2
df3 = pd.DataFrame({'key': ['K0', 'K1', 'K4', 'K3'],
    'C': ['C0', 'C1', 'C2', 'C3'],
    'D': ['D0', 'D1', 'D2', 'D3']})

# left join
pd.merge(df1, df3, on='key', how='left')

# dataframes with 2 key columns
dfleft = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K1'],
    'key2': ['K0', 'K1', 'K2', 'K3'],
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3']})

dfright = pd.DataFrame({'key1': ['K0', 'K0', 'K2', 'K1'],
    'key2': ['K0', 'K1', 'K1', 'K2'],
    'C': ['C0', 'C1', 'C2', 'C3'],
    'D': ['D0', 'D1', 'D2', 'D3']})

# merge on multiple keys
pd.merge(dfleft, dfright, on=['key1','key2'])

jarhoads/pandas_notes.md

Pandas