Balachandra Pai balachandrapai

balachandrapai / TextClassify.py

Created March 11, 2018 18:21

Extracting feature sets, Classification using NaiveBayes, Pickle basics

	##Text classification is always in the form of binary.
	##ie, either spam or not spam

	import nltk
	import random
	from nltk.corpus import movie_reviews
	import pickle

	## Create a list of tuples or features

balachandrapai / TextClassify.py

Created March 11, 2018 18:21

Extracting feature sets, Classification using NaiveBayes, Pickle basics

	##Text classification is always in the form of binary.
	##ie, either spam or not spam

	import nltk
	import random
	from nltk.corpus import movie_reviews
	import pickle

	## Create a list of tuples or features

balachandrapai / WordNet.py

Created March 10, 2018 10:12

Using synsets, finding synonyms and antonyms, word similarity

	from nltk.corpus import wordnet

	syns = wordnet.synsets("program")
	print(syns[0].name())
	#plan.n.01
	print(syns[0].lemmas()[0].name())
	#plan
	print(syns[0].definition())
	#a series of steps to be carried out or goals to be accomplished
	print(syns[0].examples())

balachandrapai / NLPNer.py

Created March 9, 2018 06:52

NLP NamedEntityRecognition

	import nltk
	from nltk.corpus import state_union
	from nltk.tokenize import PunktSentenceTokenizer

	train_text = state_union.raw("2005-GWBush.txt")
	sample_text = state_union.raw("2006-GWBush.txt")

	custom_sent_tokenizer = PunktSentenceTokenizer(train_text)

	tokenized = custom_sent_tokenizer.tokenize(sample_text)

balachandrapai / NLPBasics.py

Created March 9, 2018 06:21

POS tags, Chunking and Chinking

	##POS tagging is labeling words in a sentence as nouns, adjectives, verbs...etc
	import nltk
	from nltk.corpus import state_union
	from nltk.tokenize import PunktSentenceTokenizer

	##PunktSentenceTokenizer a new sentence tokenizer
	## This tokenizer is capable of unsupervised machine learning,
	##so you can actually train it on any body of text that you use

	##Creating training and testing data

balachandrapai / NLPBasics.py

Last active September 26, 2020 13:58

Basics of NLP using NLTK ( tokenizing words and sentences, stop words, stemming words, Lemmatization)

	from nltk.corpus import stopwords
	from nltk.tokenize import sent_tokenize, word_tokenize

	##Tokenizing - Splitting sentences and words from the body of text.
	##Part of Speech tagging

	##Corpus - Body of text, singular. Corpora is the plural of this.
	##Example: A collection of medical journals.

	##Lexicon - Words and their meanings.