gupul2k’s gists

gupul2k / mf_500_Bag_of_Words.py

Created October 16, 2012 20:43

NLP: Count frequent words in a file

	#Author: Sobhan Hota
	#Finds most frequent 500 words in a given file

	from string import punctuation
	from operator import itemgetter

	N = 500
	words = {}

	words_gen = (word.strip(punctuation).lower() for line in open("C:\Python27\Corpus.txt")

gupul2k / bigrams_vectorgen.py

Last active October 11, 2015 21:38

NLP: Bigram Vector Generation by Python

	#Author: Sobhan Hota
	#Date: Oct 20 2012
	#Script can run to generate vector for bigram collected in Source File
	#captures the count from the supplied input file (if present), then divide by
	#input file document length.


	import itertools
	from collections import Counter

gupul2k / pos_tagging.py

Created November 2, 2012 13:32

NER and POS Tagging with NLTK and Python

	#Script tags POS and NER[Named Entity Recognition] for a supplied text file.
	#Date: Nov 2 2012
	#Author: Hota Sobhan

	import nltk

	f = open('C:\Python27\Test_File.txt')
	data = f.readlines()

	#Parse the text file for NER with POS Tagging

gupul2k / Most_freq_500_BoWs.py

Created December 10, 2012 22:59

Find Most Frequent 500 BoWs(Bag of Words)

	#!/usr/bin/python
	#Script to generate most frequent 500 BoWs from a corpus (ie lexicon).
	#Date: Nov 2 2012
	#Author: Hota Sobhan

	from string import punctuation
	from operator import itemgetter

	N = 1000
	words = {}

gupul2k / vector_generation_BoWs.py

Created December 10, 2012 23:02

Feature Vector Generation for supplied BoWs

	#!/usr/bin/python
	#Script to generate feature vector for a supplied BoWs file.
	#Date: Nov 2 2012
	#Author: Hota Sobhan

	from string import punctuation
	from operator import itemgetter

	words = {}
	total_words = 0

gupul2k / gist:4971473

Last active December 13, 2015 20:39

Simple Parser to read SMO Output on multiple class [Ex: 100k lines of file] and splits into NC2 = n!/(n-2)!*2!, those many text files. <br> Input: SMO Output in a text file <br> Output: separated binary class weights into corresponding bucket [file names signify]

	/* @(#) SeparateSMOBinaryOpsToFiles.java 1.00 2/17/2013
	*
	* [Copyright Information]
	*/
	/*
	* Revision History:
	* Revision Version Project Change Date Author Description
	* No. No. Code Req. no.
	* 1 1 Sobhan H
	*/

gupul2k / gist:5072976

Created March 2, 2013 19:51

Simple TermDocument Matrix: This takes 3 arguments. Raw File, Word_List in Corpus, FWs List


	/* @(#) TermDocumentMatrix 1.00 2/25/2013
	*
	* [Copyright Information]
	*/
	/*
	* Revision History:
	* Revision Version Project Change Date Author Description
	* No. No. Code Req. no.
	* 1 1 Sobhan Hota

gupul2k / Cosine_Similarity_compare_sentences

Last active December 20, 2015 09:48

This implements cosine measure and compares two English sentences (code is Python with libraries from NLTK, NUMPY)

	import re
	import nltk
	from numpy import zeros,dot
	from numpy.linalg import norm

	# Get stop words
	stop_words = [w.strip() for w in open('C:\FWs.txt','r').readlines()]

	splitter = re.compile ( "[a-z\-']+", re.I )
	stemmer = nltk.PorterStemmer()

gupul2k / IMDB_Scraper

Created September 3, 2014 22:26

	#!/usr/bin/env python27

	#Importing the modules
	import sys
	import urllib2
	import re
	import json

	#Ask for movie title
	title = raw_input("Please enter a movie title: ")

gupul2k / exe_ORA_query_via_python

Created September 17, 2014 12:49

Find Oracle table locks - Python script

	import cx_Oracle
	con = cx_Oracle.connect('scott/tiger@IPADDRESS/SID')
	print con.version

	cur = con.cursor()
	cur.execute('select a.session_id,a.oracle_username, a.os_user_name, b.owner "OBJECT OWNER", b.object_name,b.object_type,a.locked_mode from (select object_id, SESSION_ID, ORACLE_USERNAME, OS_USER_NAME, LOCKED_MODE from v$locked_object) a, (select object_id, owner, object_name,object_type from dba_objects) b where a.object_id=b.object_id')

	res = cur.fetchall()
	for r in res:

Sobhan Hota gupul2k