Last active
December 9, 2015 12:27
-
-
Save amitt001/261b18c7d5c0ba318559 to your computer and use it in GitHub Desktop.
Sentiment Analysis Project Details
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
All the sentiment analysis data is present in the folder named "senti" | |
Directory structure: | |
senti | |
├── Trainingset_creator | |
│ ├── README.rst | |
│ ├── appsid | |
│ ├── reviews_crawler.py | |
│ └── settings.py | |
├── config.py | |
├── rate_opinion.py | |
├── reviews_sentiment.py | |
├── reviews_sentiment_read.py | |
├── reviews_sentiment_write.py | |
└── sentiment_mod.py | |
The main file: | |
* rate_opinion.py: This script is the main script that internally calls sentiment_mod. Based on the response of | |
sentiment_mod module it saves the data in mongodb database. To run simply run this in terminal: | |
$ python rate_opinion.py | |
But this script will take a lots of time because more than .2 million apps. | |
* sentiment_mod.py: Module to get the sentiment. It can be used directly. Usage: | |
In python console: | |
>>> #call the sentiment method. This will return pos for positive or neg for negative. | |
This may also return neu for neutral. Neutral means no words were present in the featureset. | |
For ex hindi words not present in featuresent. | |
>>> import sentiment_mod | |
>>> sentiment_mod.sentiment('test text for testing.') | |
>>> pos #or neg | |
___________________________________________________________ | |
1. Trainingset_Creator: | |
This directory of no use rightnow. I used the review_crawler.py script insife this dirrectory to create | |
training set for sentiment analysis. Now the sentment analysis models are alredy created this directory is not required. | |
2. config.py: configuration for getting and setting the data out of the mongodb database. | |
3. review_sentiment.py: Not used. | |
4. review_sentiment_write.py: THIS Trains the classifiers and then PICKLES it in picle directory. | |
5. review_sentiment_read.py: This flies code is similar to 'review_sentiment_write.py'. THIS FILE READS FROM | |
THE ALREADY PICKLED FILES in pickle directory. FIRST RUN 'reviews_sentiment_write.py' and then RUN THIS ONLY | |
TO CHECK ACCURACY FROM THE ALREADY PICKLED FILES. ALL THE CHNAGES MUST BE MADE TO THE 'reviews_sentiment_write.py'. | |
Classifiers used: | |
1. Naive Bayes Classifiers | |
2. Neutral Support vector machine classifier, | |
3. Linear Support vector machine classifier, | |
4. Stochastic gradient descent Classifier, | |
5. MultiNomial Nive Bayes classifier, | |
6. Bernoulli Naive Bayes classifier, | |
7. Logistic Regression classifier | |
How algo works: | |
I am using scikit-learn package of python for classification. All the algorithms rate the reviews and then lastly based | |
rating with higher votes reviews are rated. For ex: if 1,2,3,7 classifier votes a apps review as positive and 4,5,6 rates | |
it a negative it. Postive will be considered as the fina result because more than 50% votes for it. | |
Note: Positive.txt, Negative.txt and pickle directory are missing from the github repo. These files are present on server. | |
* pickle: this directory contains the serialized data for all the 7 algorithms. So, instead of traing the testing the data | |
each time we want to test the sentiment of an apps review the pickled files are read adn loaded in memory. | |
* positive.txt conatins the postive training reviews. I used this file to train the classifiers. | |
* Negative.txt: same as abouve for negative training and testing data. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment