Skip to content

Instantly share code, notes, and snippets.

@jkuruzovich
Last active November 7, 2018 07:42
Show Gist options
  • Save jkuruzovich/a78ca939dea78fe0893e258178f6d96b to your computer and use it in GitHub Desktop.
Save jkuruzovich/a78ca939dea78fe0893e258178f6d96b to your computer and use it in GitHub Desktop.
Text Mining and Unstructured Data

Class 12: Technology Fundamentals of Business Analytics

Text Mining and Unstructured Data

Class Objective:

The goal of this class is to investigate basic concepts surrounding text mining.

Readings (To be done before class):

In Class Activities:

Assignment (due the second Wednesday following class by 11:59 PM):

(From tf-idf example)

  1. Manually create a new corpus and QUERY_TERMS (you could use sample tweets for example or even longer documents.) For your corpus, calculate the tf, idf, and tf-idf for 3 of the words. Explain the difference.

(From Python Example for What's Cooking)
2. Submit the predictions for the trainc and trainc2 models to Kaggle. Provide a screen shot for each. (The output files are actually already created, and must be for test dataset).
3. Calculate the accuracy for the trainc and trainc2 models for the predictions given using the training dataset.

(From Python Example for Bag of Popcorn)
4. Submit the predictions for the bag of words model and the Word2Vec model models to Kaggle. Provide a screen shot for each.
5. Calculate the accuracy for the bag of words model and the Word2Vec model for the predictions given using the training dataset.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment