Skip to content

Instantly share code, notes, and snippets.

View balachandrapai's full-sized avatar

Balachandra Pai balachandrapai

View GitHub Profile
@balachandrapai
balachandrapai / TextClassify.py
Created March 11, 2018 18:21
Extracting feature sets, Classification using NaiveBayes, Pickle basics
##Text classification is always in the form of binary.
##ie, either spam or not spam
import nltk
import random
from nltk.corpus import movie_reviews
import pickle
## Create a list of tuples or features
@balachandrapai
balachandrapai / TextClassify.py
Created March 11, 2018 18:21
Extracting feature sets, Classification using NaiveBayes, Pickle basics
##Text classification is always in the form of binary.
##ie, either spam or not spam
import nltk
import random
from nltk.corpus import movie_reviews
import pickle
## Create a list of tuples or features
@balachandrapai
balachandrapai / WordNet.py
Created March 10, 2018 10:12
Using synsets, finding synonyms and antonyms, word similarity
from nltk.corpus import wordnet
syns = wordnet.synsets("program")
print(syns[0].name())
#plan.n.01
print(syns[0].lemmas()[0].name())
#plan
print(syns[0].definition())
#a series of steps to be carried out or goals to be accomplished
print(syns[0].examples())
@balachandrapai
balachandrapai / NLPNer.py
Created March 9, 2018 06:52
NLP NamedEntityRecognition
import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer
train_text = state_union.raw("2005-GWBush.txt")
sample_text = state_union.raw("2006-GWBush.txt")
custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
tokenized = custom_sent_tokenizer.tokenize(sample_text)
@balachandrapai
balachandrapai / NLPBasics.py
Created March 9, 2018 06:21
POS tags, Chunking and Chinking
##POS tagging is labeling words in a sentence as nouns, adjectives, verbs...etc
import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer
##PunktSentenceTokenizer a new sentence tokenizer
## This tokenizer is capable of unsupervised machine learning,
##so you can actually train it on any body of text that you use
##Creating training and testing data
@balachandrapai
balachandrapai / NLPBasics.py
Last active September 26, 2020 13:58
Basics of NLP using NLTK ( tokenizing words and sentences, stop words, stemming words, Lemmatization)
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize, word_tokenize
##Tokenizing - Splitting sentences and words from the body of text.
##Part of Speech tagging
##Corpus - Body of text, singular. Corpora is the plural of this.
##Example: A collection of medical journals.
##Lexicon - Words and their meanings.