Sobhan Hota gupul2k

gupul2k / pos_tagging.py

Created November 2, 2012 13:32

NER and POS Tagging with NLTK and Python

	#Script tags POS and NER[Named Entity Recognition] for a supplied text file.
	#Date: Nov 2 2012
	#Author: Hota Sobhan

	import nltk

	f = open('C:\Python27\Test_File.txt')
	data = f.readlines()

	#Parse the text file for NER with POS Tagging

gupul2k / bigrams_vectorgen.py

Last active October 11, 2015 21:38

NLP: Bigram Vector Generation by Python

	#Author: Sobhan Hota
	#Date: Oct 20 2012
	#Script can run to generate vector for bigram collected in Source File
	#captures the count from the supplied input file (if present), then divide by
	#input file document length.


	import itertools
	from collections import Counter

gupul2k / mf_500_Bag_of_Words.py

Created October 16, 2012 20:43

NLP: Count frequent words in a file

	#Author: Sobhan Hota
	#Finds most frequent 500 words in a given file

	from string import punctuation
	from operator import itemgetter

	N = 500
	words = {}

	words_gen = (word.strip(punctuation).lower() for line in open("C:\Python27\Corpus.txt")