The goal of this class is to investigate basic concepts surrounding text mining.
- The Seven Practice Areas of Text Mining
- The Amazing Power of Word Vectors
- Bag of Words Tutorial
- Word Vectors
- Regression Example in R
- Presentations
- Introduction to Text Mining in Python Links: local github slides
- What's Cooking Example Links: local github slides
- Bag of Popcorn Example Links: local github slides
(From tf-idf example)
- Manually create a new corpus and QUERY_TERMS (you could use sample tweets for example or even longer documents.) For your corpus, calculate the tf, idf, and tf-idf for 3 of the words. Explain the difference.
(From Python Example for What's Cooking)
2. Submit the predictions for the trainc and trainc2 models to Kaggle. Provide a screen shot for each. (The output files are actually already created, and must be for test dataset).
3. Calculate the accuracy for the trainc and trainc2 models for the predictions given using the training dataset.
(From Python Example for Bag of Popcorn)
4. Submit the predictions for the bag of words model and the Word2Vec model models to Kaggle. Provide a screen shot for each.
5. Calculate the accuracy for the bag of words model and the Word2Vec model for the predictions given using the training dataset.
This work is licensed under a Creative Commons Attribution 4.0 International License.