Leverage pandas'
powerful data manipulation engine to get the most out of your data. Drill into the data that really matters by extracting, filtering, and transforming data from DataFrames. The pandas library has many techniques that make this process efficient and intuitive. You will learn how to tidy, rearrange, and restructure your data by pivoting or melting and stacking or unstacking DataFrames.
Lead by Team Anaconda, Data Science Consultant at Lander Analytics
Index, slice, filter, and transform DataFrames using a variety of datasets, ranging from 2012 US election data for the state of Pennsylvania to Pittsburgh weather data.
- Indexing using
- square brackets
- column a!ribute and row label
.loc
and.iloc
accessors- selecting only some columns
- Slicing DataFrames
- Slicing and indexing a Series
- Series versus 1-column DataFrame
- Filtering DataFrames
- Filtering with a Boolean Series
- Transforming DataFrames
- DataFrame vectorized methods
- NumPy vectorized functions
- Working with string values
df['salt']['Jan']
df.eggs['Mar']
df.loc['May', 'spam']
df.iloc[4, 2]
df[['salt','eggs']]
In [3]: type(df['eggs'])
Out[3]: pandas.core.series.Series
df['eggs'][1:4] # Part of the eggs column
df.loc[:, 'eggs':'salt'] # All rows, some columns
df.loc['Jan':'Apr',:] # Some rows, all columns
df.iloc[2:5, 1:] # A block from middle of the DataFrame
df.loc['Jan':'May', ['eggs', 'spam']]
df.iloc[[0,4,5], 0:2]
In [16]: type(df[['eggs']])
Out[16]: pandas.core.frame.DataFrame
df.salt > 60
df[df.salt > 60]
df[(df.salt >= 50) & (df.eggs < 200)] # Both conditions
df[(df.salt >= 50) | (df.eggs < 200)] # Either condition
df2.loc[:, df2.all()] # Select columns with all nonzeros
df2.loc[:, df2.any()] # Select columns with any nonzeros
df.loc[:, df.isnull().any()] # Select columns with any NaNs
df.loc[:, df.notnull().all()] # Select columns without NaNs
df.dropna(how='any')
df.eggs[df.salt > 55] += 5
df.floordiv(12) # Convert to dozens unit
np.floor_divide(df, 12) # Convert to dozens unit
df.apply(lambda n: n//12)
def dozens(n): return n//12
df.apply(dozens)
df['dozens_of_eggs'] = df.eggs.floordiv(12)
df['salty_eggs'] = df.salt + df.dozens_of_eggs
df.index = df.index.str.upper()
df.index = df.index.map(str.lower)
Advanced indexing techniques with MultiIndexes, or hierarchical indexes. Interact with and extract data from them.
Reshape your DataFrames using techniques such as pivoting, melting, stacking, and unstacking -- techniques to tidy and rearrange your data into the optimal format for data analysis.
Identify and split DataFrames by groups or categories for further aggregation or analysis. Transform and filter your data, and detect outliers and impute missing values.
Work with data recorded from the Summer Olympic games that goes as far back as 1896! Pivot, unstack, group, slice, and reshape your data as you explore this dataset and uncover some truly fascinating insights.