Skip to content

Instantly share code, notes, and snippets.

View AayushSameerShah's full-sized avatar
🤨
Hmm...

Aayush Shah AayushSameerShah

🤨
Hmm...
View GitHub Profile
@AayushSameerShah
AayushSameerShah / replace_multiple_in_string.py
Created June 4, 2021 10:20
Use re.sub to avoid yourself doing "string".replace().replace().replace()
import re
def remove(str_):
return re.sub(pattern, '', str_)
df.column.apply(remove)
@AayushSameerShah
AayushSameerShah / diverging_bar.py
Created June 4, 2021 07:10
The new kind of bar plot - diverging. There is no builtin - but create it on your own (SO SIMPLE)
# For Vertical
plt.vlines(x= df.index, ymin= df.mean(), ymax= df.values, alpha=0.4, linewidth=5)
# For Horizontal
plt.hlines(y= df.index, xmin= df.mean(), xmax= df.values, alpha=0.4, linewidth=5)
@AayushSameerShah
AayushSameerShah / plt_savefig_cut.py
Created June 2, 2021 09:55
Remove the cutoff in savefig
plt.savefig('myfile.png', bbox_inches="tight")
@AayushSameerShah
AayushSameerShah / change_levels_hierarchy_index.py
Created May 31, 2021 12:32
This code snippet will show how to deal with the order of levels when you want to stack or unstack the df
# Examples can be found in Pandas Book (Notebooks) 3 Section > 2. Combining, Merging
# This is very manual (but a kind of hack)
df.stack(level= [0, 1]).unstack(level= [1, 2, 3])
# Desired level and their order ↑ (Both stack / unstack required)
# Real Way - efficient
df.unstack().swaplevel(2, 0, axis= 1).sort_index(level= 0, aixs= 1)
# This is the ↑ person. ↑ This sorting is optional but required for cleaner dataset
'''There can be a situation while cleaning the data
you might want ot replace one dataframe's value on
certain indices with other df's values (usually cl-
eared ones) based on the index - This MIGHT help.
2 Solutions.'''
# 1. Great - Almost all time will work
df.col_to_change.update(df.col_new_value)
' As both are Series - this will work! '
@AayushSameerShah
AayushSameerShah / Get_values_multiindex.py
Created May 28, 2021 15:11
This is gonna be AMAZING time saver. When you want to get all values from the index but that is in the form of multiindex, use this (don't write for loop as you did!)
# AMAZING!
df.columns.get_level_values(0)
@AayushSameerShah
AayushSameerShah / categorize_multiple2.py
Created May 27, 2021 11:19
This is the pandas book version to deal with the overlapping categories
'''Example data
Name Genre
0 TENET Action|Thriller
1 MEMENTO Crime|Thriller|Action
2 AVENGERS Children's
'''
# SPOILER ALERT: This method is the UNDERLYING method. Just use - df.Genre.str.get_dummies("|") for the same result (more on this later)
# Step 1: Get the unique genre
@AayushSameerShah
AayushSameerShah / style_on_condition.py
Last active May 31, 2021 11:58
This is the HACK to highlight cells in pandas based on a condition
# Wow way
df[(np.abs(df) > 2).any(1)].style.applymap(lambda x: "background: yellow" if np.abs(x) > 2 else "")
# ↑ Here applymap is used which
# is eqivalent to the code below
# Verbose, but still more control way
df[(np.abs(df) > 2).any(1)].style.apply(lambda x: ["background: yellow" if np.abs(value) > 2 else "" for value in x])
# ↑ Only apply is used, so needed to itereate over all elements in this ↑ list comp way
@AayushSameerShah
AayushSameerShah / WigglyLinePLT.py
Created May 26, 2021 13:35
How to plot wiggly lines in matplotlib
import numpy as np
import matplotlib.pyplot as plt
plt.xkcd() # <-- This does the job
plt.figure()
plt.plot(np.linspace(0.7,1.42,100),[0.7]*100)
plt.show()
@AayushSameerShah
AayushSameerShah / Change_in_date.py
Created May 23, 2021 06:21
This will help to teak date and time in the existing data
DF.dateCol.apply(lambda x: x.replace(year= x.year + 2))