Skip to content

Instantly share code, notes, and snippets.

@AayushSameerShah
Created May 27, 2021 11:19
Show Gist options
  • Save AayushSameerShah/23b145fe9e3f56acc28fbdaf51186902 to your computer and use it in GitHub Desktop.
Save AayushSameerShah/23b145fe9e3f56acc28fbdaf51186902 to your computer and use it in GitHub Desktop.
This is the pandas book version to deal with the overlapping categories
'''Example data
Name Genre
0 TENET Action|Thriller
1 MEMENTO Crime|Thriller|Action
2 AVENGERS Children's
'''
# SPOILER ALERT: This method is the UNDERLYING method. Just use - df.Genre.str.get_dummies("|") for the same result (more on this later)
# Step 1: Get the unique genre
gens = []
for gen in df.Genre:
gens.extend(gen.split("|"))
gens = pd.unique(gens)
# Step 2: Construct the DF to store 0 and 1
zero_one = DataFrame(np.zeros(len(df.Name), len(gens), columns= gens))
# Step 3: MAIN - Use .get_indexer to get location for each movie's genre
for i, gen in enumerate(df.Genre):
indices = zero_one.columns.get_indexer(gen.split("|"))
zerp_one.iloc[i, indices] = 1
# DONE!
'''Now MORE ON THIS LATER part:
The
df.Genre.str.get_dummies("|")
would have resulted the same but in the one line!
Do that... and if you want to learn the internals, then go for the written part!
'''
@AayushSameerShah
Copy link
Author

You can see my own version of dealing with the overlapping categorical values here:
https://gist.github.com/AayushSameerShah/58e09fd89833f467dc462ba0807bf733

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment