Skip to content

Instantly share code, notes, and snippets.

@misho-kr
Last active October 11, 2020 10:37
Show Gist options
  • Save misho-kr/2f425bacc467ba2f7cc8d6d34a9d43d8 to your computer and use it in GitHub Desktop.
Save misho-kr/2f425bacc467ba2f7cc8d6d34a9d43d8 to your computer and use it in GitHub Desktop.
Summary of "Manipulating DataFrames with pandas" course on Datacamp

Leverage pandas' powerful data manipulation engine to get the most out of your data. Drill into the data that really matters by extracting, filtering, and transforming data from DataFrames. The pandas library has many techniques that make this process efficient and intuitive. You will learn how to tidy, rearrange, and restructure your data by pivoting or melting and stacking or unstacking DataFrames.

Lead by Team Anaconda, Data Science Consultant at Lander Analytics

Extracting and transforming data

Index, slice, filter, and transform DataFrames using a variety of datasets, ranging from 2012 US election data for the state of Pennsylvania to Pittsburgh weather data.

  • Indexing using
    • square brackets
    • column a!ribute and row label
    • .loc and .iloc accessors
    • selecting only some columns
  • Slicing DataFrames
    • Slicing and indexing a Series
    • Series versus 1-column DataFrame
  • Filtering DataFrames
    • Filtering with a Boolean Series
  • Transforming DataFrames
    • DataFrame vectorized methods
    • NumPy vectorized functions
  • Working with string values
df['salt']['Jan']
df.eggs['Mar']
df.loc['May', 'spam']
df.iloc[4, 2]
df[['salt','eggs']]

In [3]: type(df['eggs'])
Out[3]: pandas.core.series.Series

df['eggs'][1:4]           # Part of the eggs column
df.loc[:, 'eggs':'salt']  # All rows, some columns
df.loc['Jan':'Apr',:]     # Some rows, all columns
df.iloc[2:5, 1:]          # A block from middle of the DataFrame
df.loc['Jan':'May', ['eggs', 'spam']]
df.iloc[[0,4,5], 0:2]

In [16]: type(df[['eggs']])
Out[16]: pandas.core.frame.DataFrame
df.salt > 60
df[df.salt > 60]
df[(df.salt >= 50) & (df.eggs < 200)] # Both conditions
df[(df.salt >= 50) | (df.eggs < 200)] # Either condition

df2.loc[:, df2.all()]         # Select columns with all nonzeros
df2.loc[:, df2.any()]         # Select columns with any nonzeros
df.loc[:, df.isnull().any()]  # Select columns with any NaNs
df.loc[:, df.notnull().all()] # Select columns without NaNs
df.dropna(how='any')

df.eggs[df.salt > 55] += 5

df.floordiv(12)         # Convert to dozens unit
np.floor_divide(df, 12) # Convert to dozens unit
df.apply(lambda n: n//12)

def dozens(n): return n//12
df.apply(dozens)

df['dozens_of_eggs'] = df.eggs.floordiv(12)
df['salty_eggs'] = df.salt + df.dozens_of_eggs

df.index = df.index.str.upper()
df.index = df.index.map(str.lower)

Advanced indexing

Advanced indexing techniques with MultiIndexes, or hierarchical indexes. Interact with and extract data from them.

Rearranging and reshaping data

Reshape your DataFrames using techniques such as pivoting, melting, stacking, and unstacking -- techniques to tidy and rearrange your data into the optimal format for data analysis.

Grouping data

Identify and split DataFrames by groups or categories for further aggregation or analysis. Transform and filter your data, and detect outliers and impute missing values.

Bringing it all together

Work with data recorded from the Summer Olympic games that goes as far back as 1896! Pivot, unstack, group, slice, and reshape your data as you explore this dataset and uncover some truly fascinating insights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment