Skip to content

Instantly share code, notes, and snippets.

@ravikiranj
Last active October 4, 2015 13:07
Show Gist options
  • Select an option

  • Save ravikiranj/2641532 to your computer and use it in GitHub Desktop.

Select an option

Save ravikiranj/2641532 to your computer and use it in GitHub Desktop.
Bulk feature extraction
#Read the tweets one by one and process it
inpTweets = csv.reader(open('data/sampleTweets.csv', 'rb'), delimiter=',', quotechar='|')
stopWords = getStopWordList('data/feature_list/stopwords.txt')
featureList = []
# Get tweet words
tweets = []
for row in inpTweets:
sentiment = row[0]
tweet = row[1]
processedTweet = processTweet(tweet)
featureVector = getFeatureVector(processedTweet, stopWords)
featureList.extend(featureVector)
tweets.append((featureVector, sentiment));
#end loop
# Remove featureList duplicates
featureList = list(set(featureList))
# Extract feature vector for all tweets in one shote
training_set = nltk.classify.util.apply_features(extract_features, tweets)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment