Skip to content

Instantly share code, notes, and snippets.

@gupul2k
gupul2k / mf_500_Bag_of_Words.py
Created October 16, 2012 20:43
NLP: Count frequent words in a file
#Author: Sobhan Hota
#Finds most frequent 500 words in a given file
from string import punctuation
from operator import itemgetter
N = 500
words = {}
words_gen = (word.strip(punctuation).lower() for line in open("C:\Python27\Corpus.txt")
@gupul2k
gupul2k / bigrams_vectorgen.py
Last active October 11, 2015 21:38
NLP: Bigram Vector Generation by Python
#Author: Sobhan Hota
#Date: Oct 20 2012
#Script can run to generate vector for bigram collected in Source File
#captures the count from the supplied input file (if present), then divide by
#input file document length.
import itertools
from collections import Counter
@gupul2k
gupul2k / pos_tagging.py
Created November 2, 2012 13:32
NER and POS Tagging with NLTK and Python
#Script tags POS and NER[Named Entity Recognition] for a supplied text file.
#Date: Nov 2 2012
#Author: Hota Sobhan
import nltk
f = open('C:\Python27\Test_File.txt')
data = f.readlines()
#Parse the text file for NER with POS Tagging
@gupul2k
gupul2k / Most_freq_500_BoWs.py
Created December 10, 2012 22:59
Find Most Frequent 500 BoWs(Bag of Words)
#!/usr/bin/python
#Script to generate most frequent 500 BoWs from a corpus (ie lexicon).
#Date: Nov 2 2012
#Author: Hota Sobhan
from string import punctuation
from operator import itemgetter
N = 1000
words = {}
@gupul2k
gupul2k / vector_generation_BoWs.py
Created December 10, 2012 23:02
Feature Vector Generation for supplied BoWs
#!/usr/bin/python
#Script to generate feature vector for a supplied BoWs file.
#Date: Nov 2 2012
#Author: Hota Sobhan
from string import punctuation
from operator import itemgetter
words = {}
total_words = 0
@gupul2k
gupul2k / gist:4971473
Last active December 13, 2015 20:39
Simple Parser to read SMO Output on multiple class [Ex: 100k lines of file] and splits into NC2 = n!/(n-2)!*2!, those many text files. <br> Input: SMO Output in a text file <br> Output: separated binary class weights into corresponding bucket [file names signify]
/* @(#) SeparateSMOBinaryOpsToFiles.java 1.00 2/17/2013
*
* [Copyright Information]
*/
/*
* Revision History:
* Revision Version Project Change Date Author Description
* No. No. Code Req. no.
* 1 1 Sobhan H
*/
@gupul2k
gupul2k / gist:5072976
Created March 2, 2013 19:51
Simple TermDocument Matrix: This takes 3 arguments. Raw File, Word_List in Corpus, FWs List
/* @(#) TermDocumentMatrix 1.00 2/25/2013
*
* [Copyright Information]
*/
/*
* Revision History:
* Revision Version Project Change Date Author Description
* No. No. Code Req. no.
* 1 1 Sobhan Hota
@gupul2k
gupul2k / Cosine_Similarity_compare_sentences
Last active December 20, 2015 09:48
This implements cosine measure and compares two English sentences (code is Python with libraries from NLTK, NUMPY)
import re
import nltk
from numpy import zeros,dot
from numpy.linalg import norm
# Get stop words
stop_words = [w.strip() for w in open('C:\FWs.txt','r').readlines()]
splitter = re.compile ( "[a-z\-']+", re.I )
stemmer = nltk.PorterStemmer()
#!/usr/bin/env python27
#Importing the modules
import sys
import urllib2
import re
import json
#Ask for movie title
title = raw_input("Please enter a movie title: ")
@gupul2k
gupul2k / exe_ORA_query_via_python
Created September 17, 2014 12:49
Find Oracle table locks - Python script
import cx_Oracle
con = cx_Oracle.connect('scott/tiger@IPADDRESS/SID')
print con.version
cur = con.cursor()
cur.execute('select a.session_id,a.oracle_username, a.os_user_name, b.owner "OBJECT OWNER", b.object_name,b.object_type,a.locked_mode from (select object_id, SESSION_ID, ORACLE_USERNAME, OS_USER_NAME, LOCKED_MODE from v$locked_object) a, (select object_id, owner, object_name,object_type from dba_objects) b where a.object_id=b.object_id')
res = cur.fetchall()
for r in res: