Skip to content

Instantly share code, notes, and snippets.

@dpoulopoulos
Last active February 26, 2020 09:42
Show Gist options
  • Select an option

  • Save dpoulopoulos/ece43bb9674c2a1afd1779e56b4591bb to your computer and use it in GitHub Desktop.

Select an option

Save dpoulopoulos/ece43bb9674c2a1afd1779e56b4591bb to your computer and use it in GitHub Desktop.
Load wiki movie and English first names data set.
import pandas as pd
# load Wikipedia Movie Plots Dataset
df = pd.read_csv('wiki_plots.csv')
# load the English names dataset
names_df = pd.read_csv('first_names.all.txt', names=['names'], header=0)
# keep only the relevant columns
df = df[['Title', 'Plot']]
# sample 50% of the movies
df = df.sample(frac=.5)
# visualise the dataset
df.head()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment