Skip to content

Instantly share code, notes, and snippets.

@jarhoads
Created December 19, 2018 15:43
Show Gist options
  • Save jarhoads/9d38f547221246fb7b957889c1de4c97 to your computer and use it in GitHub Desktop.
Save jarhoads/9d38f547221246fb7b957889c1de4c97 to your computer and use it in GitHub Desktop.
pandas notes for reference

Pandas

Summary

  • Python Package
  • Panel Data System
  • Data Analysis Library
  • Data Structures:
    Series
    DataFrame
    Panel
  • Structured on top of numpy
  • Foundation for real-world data analysis in Python

Functionality

  • Relational and label-based data management
  • Compatibilities:
    Time Series Data (Ordered, Unordered)
    Matrix Data (arbitrary) row and column labels (Homogeneous, Heterogeneous)
  • Tabular data that contains heterogeneously types columns
  • Various observational and statistical datasets

R Style Library

  • R style data handling
  • Perform fast joins and merges
  • Read data from various sources
  • Write data to various formats
  • Operations:
    Handling missing data
    Merging and joining datasets
    Reshaping and pivoting
    Group By engine
    Size mutability
    Convert index data
    Robust I/O tools
    Time Series

Execute Test Suite

  • Execute unit tests to verify pandas is working with $ nosetests pandas
  • nose extends python unit testing framework

Required Dependencies

  • setuptools
  • NumPy (1.7.1 or greater)
  • pytz (time zone support)
  • python-dateutil (1.5 or higher)

Recommended Dependencies

  • numexpr
  • bottleneck

Optional

  • SciPy
  • Cython
  • matplotlib

Examples

  • merge performs SQL like joins between data frames
  • Get help with help('pandas.merge')
  • Only mandatory parameters are two dataframes to merge (default is inner join)
  • Merge operations (performs merge on key column below):
import pandas as pd

# create dataframes
df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3']})

df2 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
    'C': ['C0', 'C1', 'C2', 'C3'],
    'D': ['D0', 'D1', 'D2', 'D3']})

# default inner join
pd.merge(df1, df2, on='key')

# changed key sequence of df2
df3 = pd.DataFrame({'key': ['K0', 'K1', 'K4', 'K3'],
    'C': ['C0', 'C1', 'C2', 'C3'],
    'D': ['D0', 'D1', 'D2', 'D3']})

# left join
pd.merge(df1, df3, on='key', how='left')

# dataframes with 2 key columns
dfleft = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K1'],
    'key2': ['K0', 'K1', 'K2', 'K3'],
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3']})

dfright = pd.DataFrame({'key1': ['K0', 'K0', 'K2', 'K1'],
    'key2': ['K0', 'K1', 'K1', 'K2'],
    'C': ['C0', 'C1', 'C2', 'C3'],
    'D': ['D0', 'D1', 'D2', 'D3']})

# merge on multiple keys
pd.merge(dfleft, dfright, on=['key1','key2'])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment