Skip to content

Instantly share code, notes, and snippets.

@MLWhiz
Created February 9, 2019 08:06
Show Gist options
  • Select an option

  • Save MLWhiz/b44551c2aef648c6cd7fa4edbf4e9713 to your computer and use it in GitHub Desktop.

Select an option

Save MLWhiz/b44551c2aef648c6cd7fa4edbf4e9713 to your computer and use it in GitHub Desktop.
# Always start with these features. They work (almost) everytime!
hv = HashingVectorizer(dtype=np.float32,
strip_accents='unicode', analyzer='word',
ngram_range=(1, 4),n_features=2**12,non_negative=True)
# Fitting Hash Vectorizer to both training and test sets (semi-supervised learning)
hv.fit(list(train_df.cleaned_text.values) + list(test_df.cleaned_text.values))
xtrain_hv = hv.transform(train_df.cleaned_text.values)
xvalid_hv = hv.transform(test_df.cleaned_text.values)
y_train = train_df.target.values
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment