Skip to content

Instantly share code, notes, and snippets.

@adityajn105
Last active December 4, 2018 05:47
Show Gist options
  • Save adityajn105/409d1cc2924bb8a135103b05a8221611 to your computer and use it in GitHub Desktop.
Save adityajn105/409d1cc2924bb8a135103b05a8221611 to your computer and use it in GitHub Desktop.
Pandas Fundamentals and Advanced

Axis in Dataframes

  1. Axis 0 means along row ( for a particular feature of all samples ) |/
  2. Axis 1 means along columns (of a particular sample for all features ) --->
  3. Remember this by a matrix 2x3 matrix. Here axis 0 has length 2 and axis 1 has length 1

Groupby

  1. Perform aggregation on one or more than one columns
  2. If more than one column in used then it is considered as multi-indexing
  3.  df.groupby( columns ).colname.aggfunc()
     
     columns : list of columns
     colname : columns which is to be aggreagted
     aggfunc : method to perform aggregation ( first, last, max, min, sum, mean, median etc)
    

apply

  1. Apply some function along a axis either axis 0 or axis 1
  2. Uses a function, function takes a series and again returns a series or a value
  3.  df.apply( lambda r: r/r.sum()*100, axis=1 ) #takes percentage distribution within a sample
     df.apply( lambda c: c/c.sum()*100, axis=0 ) #takes percentage distribution within a feature for all samples
    

crosstab

  1. Compute a simple cross-tabulation of two (or more) factors.
  2. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed
  3.  pd.crosstab( index, columns, values, aggfunc )
     
     index = series/list of series
     columns = series/list of series 
     values: optional, array-like
     aggfunc: optional, if specified then values also required
    

pivot_table

  1. Create a spreadsheet-style pivot table as a DataFrame.
  2. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame
  3. pd.pivot_table( data,index, values, columns, aggfunc )
    
    data: dataframe
    index : optional, columns as index
    values: optional, values to aggregate, if not specified choose remaining numerical columns
    columns: column name or list of column name
    aggfunc: aggregate func or list of aggregate functions
    

melt

  1. “Unpivots” a DataFrame from wide format to long format, optionally leaving identifier variables set.
  2. pd.melt(data,id_vars=["col_name"],value_vars=["col_name_2","col_name_3","col_name_4"],var_name="Category",value_name="Score")
    
    data : dataframe
    id_vars : optional, columns to include in final table
    value_vars : columns to melt (should be of same type)
    var_name : optional, defaults to 'frame.columns.name' or 'variable', name of melted new column
    value_name : optional, default to "Value", name of value column
    

stack (in multicolumn table)

  1. stack method turns last column into index values
  2.      df.stack(level=-1,drop_na=True)
         
         level : optional, defaults to -1=last level, level(s) to stack form column to index
                 or column name
                 or list of column name
         dropna : optional, defaults to True
    

unstack (in multiindex table)

  1. unstack method turns last index values into column names.
  2.      df.unstack(level=-1,fill_value=None)
         
         level : optional, defaults to -1=last index, index to stack in column
                 or index name
                 or list of index name
         dropna: optional, defaults to None, replace NaN values with something
    
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment