This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Sentence = AT_USER i heard about that contest! congrats girl!! | |
| Feature Vector | |
| ============== | |
| hey',.....'heard','congrats', .... 'bombs', 'strange', 'australian', 'women', 'drink', 'head', 'hurts', 'bloodwork' | |
| 0 1 1 0 0 0 0 0 0 0 0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def getSVMFeatureVectorAndLabels(tweets, featureList): | |
| sortedFeatures = sorted(featureList) | |
| map = {} | |
| feature_vector = [] | |
| labels = [] | |
| for t in tweets: | |
| label = 0 | |
| map = {} | |
| #Initialize empty map | |
| for w in sortedFeatures: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| .* | |
| optimization finished, #iter = 5 | |
| nu = 0.176245 | |
| obj = -2.643822, rho = 0.164343 | |
| nSV = 3, nBSV = 0 | |
| * | |
| optimization finished, #iter = 1 | |
| nu = 0.254149 | |
| obj = -2.541494, rho = 0.000000 | |
| nSV = 2, nBSV = 0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import svm | |
| from svmutil import * | |
| #training data | |
| labels = [0, 1, 1, 2] | |
| samples = [[0, 1, 0], [1, 1, 1], [1, 1, 0], [0, 0, 0]] | |
| #SVM params | |
| param = svm_parameter() | |
| param.C = 10 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #Max Entropy Classifier | |
| MaxEntClassifier = nltk.classify.maxent.MaxentClassifier.train(training_set, 'GIS', trace=3, \ | |
| encoding=None, labels=None, sparse=True, gaussian_prior_sigma=0, max_iter = 10) | |
| testTweet = 'Congrats @ravikiranj, i heard you wrote a new tech post on sentiment analysis' | |
| processedTestTweet = processTweet(testTweet) | |
| print MaxEntClassifier.classify(extract_features(getFeatureVector(processedTestTweet))) | |
| Output | |
| ======= | |
| positive |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| testTweet = 'I am so badly hurt' | |
| processedTestTweet = processTweet(testTweet) | |
| print NBClassifier.classify(extract_features(getFeatureVector(processedTestTweet))) | |
| Output | |
| ====== | |
| positive |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # print informative features about the classifier | |
| print NBClassifier.show_most_informative_features(10) | |
| Output | |
| ====== | |
| Most Informative Features | |
| contains(twitter) = False positi : neutra = 2.3 : 1.0 | |
| contains(car) = False positi : negati = 2.3 : 1.0 | |
| contains(hurts) = False positi : negati = 2.3 : 1.0 | |
| contains(articles) = False positi : neutra = 1.4 : 1.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Train the classifier | |
| NBClassifier = nltk.NaiveBayesClassifier.train(training_set) | |
| # Test the classifier | |
| testTweet = 'Congrats @ravikiranj, i heard you wrote a new tech post on sentiment analysis' | |
| processedTestTweet = processTweet(testTweet) | |
| print NBClassifier.classify(extract_features(getFeatureVector(processedTestTweet))) | |
| Output | |
| ====== |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #Read the tweets one by one and process it | |
| inpTweets = csv.reader(open('data/sampleTweets.csv', 'rb'), delimiter=',', quotechar='|') | |
| stopWords = getStopWordList('data/feature_list/stopwords.txt') | |
| featureList = [] | |
| # Get tweet words | |
| tweets = [] | |
| for row in inpTweets: | |
| sentiment = row[0] | |
| tweet = row[1] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #get feature list stored in a file (for reuse) | |
| featureList = getFeatureList('data/sampleTweetFeatureList.txt') | |
| #start extract_features | |
| def extract_features(tweet): | |
| tweet_words = set(tweet) | |
| features = {} | |
| for word in featureList: | |
| features['contains(%s)' % word] = (word in tweet_words) | |
| return features |