- Axis 0 means along row ( for a particular feature of all samples ) |/
- Axis 1 means along columns (of a particular sample for all features ) --->
- Remember this by a matrix 2x3 matrix. Here axis 0 has length 2 and axis 1 has length 1
- Perform aggregation on one or more than one columns
- If more than one column in used then it is considered as multi-indexing
-
df.groupby( columns ).colname.aggfunc() columns : list of columns colname : columns which is to be aggreagted aggfunc : method to perform aggregation ( first, last, max, min, sum, mean, median etc)
- Apply some function along a axis either axis 0 or axis 1
- Uses a function, function takes a series and again returns a series or a value
-
df.apply( lambda r: r/r.sum()*100, axis=1 ) #takes percentage distribution within a sample df.apply( lambda c: c/c.sum()*100, axis=0 ) #takes percentage distribution within a feature for all samples
- Compute a simple cross-tabulation of two (or more) factors.
- By default computes a frequency table of the factors unless an array of values and an aggregation function are passed
-
pd.crosstab( index, columns, values, aggfunc ) index = series/list of series columns = series/list of series values: optional, array-like aggfunc: optional, if specified then values also required
- Create a spreadsheet-style pivot table as a DataFrame.
- The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame
-
pd.pivot_table( data,index, values, columns, aggfunc ) data: dataframe index : optional, columns as index values: optional, values to aggregate, if not specified choose remaining numerical columns columns: column name or list of column name aggfunc: aggregate func or list of aggregate functions
- “Unpivots” a DataFrame from wide format to long format, optionally leaving identifier variables set.
-
pd.melt(data,id_vars=["col_name"],value_vars=["col_name_2","col_name_3","col_name_4"],var_name="Category",value_name="Score") data : dataframe id_vars : optional, columns to include in final table value_vars : columns to melt (should be of same type) var_name : optional, defaults to 'frame.columns.name' or 'variable', name of melted new column value_name : optional, default to "Value", name of value column
- stack method turns last column into index values
-
df.stack(level=-1,drop_na=True) level : optional, defaults to -1=last level, level(s) to stack form column to index or column name or list of column name dropna : optional, defaults to True
- unstack method turns last index values into column names.
-
df.unstack(level=-1,fill_value=None) level : optional, defaults to -1=last index, index to stack in column or index name or list of index name dropna: optional, defaults to None, replace NaN values with something