Summary
- Python Package
- Panel Data System
- Data Analysis Library
- Data Structures:
Series
DataFrame
Panel - Structured on top of numpy
- Foundation for real-world data analysis in Python
Functionality
- Relational and label-based data management
- Compatibilities:
Time Series Data (Ordered, Unordered)
Matrix Data (arbitrary) row and column labels (Homogeneous, Heterogeneous) - Tabular data that contains heterogeneously types columns
- Various observational and statistical datasets
R Style Library
- R style data handling
- Perform fast joins and merges
- Read data from various sources
- Write data to various formats
- Operations:
Handling missing data
Merging and joining datasets
Reshaping and pivoting
Group By engine
Size mutability
Convert index data
Robust I/O tools
Time Series
Execute Test Suite
- Execute unit tests to verify pandas is working with
$ nosetests pandas
nose
extends python unit testing framework
Required Dependencies
- setuptools
- NumPy (1.7.1 or greater)
- pytz (time zone support)
- python-dateutil (1.5 or higher)
Recommended Dependencies
- numexpr
- bottleneck
Optional
- SciPy
- Cython
- matplotlib
Examples
merge
performs SQL like joins between data frames- Get help with
help('pandas.merge')
- Only mandatory parameters are two dataframes to merge (default is inner join)
- Merge operations (performs merge on
key
column below):
import pandas as pd
# create dataframes
df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
df2 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
# default inner join
pd.merge(df1, df2, on='key')
# changed key sequence of df2
df3 = pd.DataFrame({'key': ['K0', 'K1', 'K4', 'K3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
# left join
pd.merge(df1, df3, on='key', how='left')
# dataframes with 2 key columns
dfleft = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K1'],
'key2': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
dfright = pd.DataFrame({'key1': ['K0', 'K0', 'K2', 'K1'],
'key2': ['K0', 'K1', 'K1', 'K2'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
# merge on multiple keys
pd.merge(dfleft, dfright, on=['key1','key2'])