Pandas Fundamentals and Advanced

Axis in Dataframes

Axis 0 means along row ( for a particular feature of all samples ) |/
Axis 1 means along columns (of a particular sample for all features ) --->
Remember this by a matrix 2x3 matrix. Here axis 0 has length 2 and axis 1 has length 1

Groupby

Perform aggregation on one or more than one columns
If more than one column in used then it is considered as multi-indexing

 df.groupby( columns ).colname.aggfunc()
 
 columns : list of columns
 colname : columns which is to be aggreagted
 aggfunc : method to perform aggregation ( first, last, max, min, sum, mean, median etc)

apply

Apply some function along a axis either axis 0 or axis 1
Uses a function, function takes a series and again returns a series or a value

 df.apply( lambda r: r/r.sum()*100, axis=1 ) #takes percentage distribution within a sample
 df.apply( lambda c: c/c.sum()*100, axis=0 ) #takes percentage distribution within a feature for all samples

crosstab

Compute a simple cross-tabulation of two (or more) factors.
By default computes a frequency table of the factors unless an array of values and an aggregation function are passed

 pd.crosstab( index, columns, values, aggfunc )
 
 index = series/list of series
 columns = series/list of series 
 values: optional, array-like
 aggfunc: optional, if specified then values also required

pivot_table

Create a spreadsheet-style pivot table as a DataFrame.
The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame

pd.pivot_table( data,index, values, columns, aggfunc )

data: dataframe
index : optional, columns as index
values: optional, values to aggregate, if not specified choose remaining numerical columns
columns: column name or list of column name
aggfunc: aggregate func or list of aggregate functions

melt

“Unpivots” a DataFrame from wide format to long format, optionally leaving identifier variables set.

pd.melt(data,id_vars=["col_name"],value_vars=["col_name_2","col_name_3","col_name_4"],var_name="Category",value_name="Score")

data : dataframe
id_vars : optional, columns to include in final table
value_vars : columns to melt (should be of same type)
var_name : optional, defaults to 'frame.columns.name' or 'variable', name of melted new column
value_name : optional, default to "Value", name of value column

stack (in multicolumn table)

stack method turns last column into index values

     df.stack(level=-1,drop_na=True)
     
     level : optional, defaults to -1=last level, level(s) to stack form column to index
             or column name
             or list of column name
     dropna : optional, defaults to True

unstack (in multiindex table)

unstack method turns last index values into column names.

     df.unstack(level=-1,fill_value=None)
     
     level : optional, defaults to -1=last index, index to stack in column
             or index name
             or list of index name
     dropna: optional, defaults to None, replace NaN values with something

adityajn105/pandas.md

Axis in Dataframes

Groupby

apply

crosstab

pivot_table

melt

stack (in multicolumn table)

unstack (in multiindex table)