specifics/fun-with-hyper-parameter-tuning.md

Last active February 25, 2018 21:59

Star () You must be signed in to star a gist
Fork () You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/specifics/4b08b11944efb79173ea496b7e1af024.js"></script>
Save specifics/4b08b11944efb79173ea496b7e1af024 to your computer and use it in GitHub Desktop.

Download ZIP

Notes for "Fun with Hyper-Parameter Tuning" presentation on 25-Feb-2018

Raw

fun-with-hyper-parameter-tuning.md

Fun with Hyper Parameter Tuning

Presentation by John Burt, Ph.D @ PSU Business Accelerator on 25 Feb 2018 @ 1pm

CV = cross-validation

Code Example 1

Done mainly using the sklearn library
Count vectorizor: each word occurrence is a vector
Text data > Feature Engineering: TfidVectorizer > Classifier: SGDClassifier > Hyper-parameter tuning: GridsearchCV

Process

Default variables
Specify params for Gridsearch to check. Gridsearch takes estimator but you can pass it other things such as a normalizer or other data pre-processor
Run Gridsearch: pass it training data and target data
Gridsearch then outputs best score and best set of params
Generate an accuracy heatmap of X:penalty and Y:number of iterations

Code example 2

Pipeline is another estimator that is passed to Gridsearch

Process

Pipeline consists of defining the two objects: TfidVectorizer and SGDClassifier
Instead of passing the classifier object, the pipeline is passed into Gridsearch
Pipeline does all param vectorization

Optimization results

No tuning w/ default params: 93.5% acc
Classifier tuning: 94.3%
Vectorizer + classifier tuning: 95.3%
Can't just vectorize everything, need to experiment tuning methods and which params to tune for better results
Find models > test > model is done > tune parameters

Challenges

Get example code working, try different parmas: both TfidVectorizer and SGDClassifier have lots of params!
Implement Gridsearch to optimizer your classifier for last session'ss Wikipedia toxicity data

Links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment