Skip to content

Instantly share code, notes, and snippets.

@gupul2k
gupul2k / pos_tagging.py
Created November 2, 2012 13:32
NER and POS Tagging with NLTK and Python
#Script tags POS and NER[Named Entity Recognition] for a supplied text file.
#Date: Nov 2 2012
#Author: Hota Sobhan
import nltk
f = open('C:\Python27\Test_File.txt')
data = f.readlines()
#Parse the text file for NER with POS Tagging
@gupul2k
gupul2k / bigrams_vectorgen.py
Last active October 11, 2015 21:38
NLP: Bigram Vector Generation by Python
#Author: Sobhan Hota
#Date: Oct 20 2012
#Script can run to generate vector for bigram collected in Source File
#captures the count from the supplied input file (if present), then divide by
#input file document length.
import itertools
from collections import Counter
@gupul2k
gupul2k / mf_500_Bag_of_Words.py
Created October 16, 2012 20:43
NLP: Count frequent words in a file
#Author: Sobhan Hota
#Finds most frequent 500 words in a given file
from string import punctuation
from operator import itemgetter
N = 500
words = {}
words_gen = (word.strip(punctuation).lower() for line in open("C:\Python27\Corpus.txt")