Skip to content

Instantly share code, notes, and snippets.

@MLWhiz
Created February 9, 2019 08:05
Show Gist options
  • Save MLWhiz/88f653ee048dca9f37ff0562989d6266 to your computer and use it in GitHub Desktop.
Save MLWhiz/88f653ee048dca9f37ff0562989d6266 to your computer and use it in GitHub Desktop.
# Always start with these features. They work (almost) everytime!
tfv = TfidfVectorizer(dtype=np.float32, min_df=3, max_features=None,
strip_accents='unicode', analyzer='word',token_pattern=r'\w{1,}',
ngram_range=(1, 3), use_idf=1,smooth_idf=1,sublinear_tf=1,
stop_words = 'english')
# Fitting TF-IDF to both training and test sets (semi-supervised learning)
tfv.fit(list(train_df.cleaned_text.values) + list(test_df.cleaned_text.values))
xtrain_tfv = tfv.transform(train_df.cleaned_text.values)
xvalid_tfv = tfv.transform(test_df.cleaned_text.values)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment