Skip to content

Instantly share code, notes, and snippets.

@nkt1546789
Created May 25, 2015 11:45
Show Gist options
  • Select an option

  • Save nkt1546789/e9fc84579b9c8356f1e5 to your computer and use it in GitHub Desktop.

Select an option

Save nkt1546789/e9fc84579b9c8356f1e5 to your computer and use it in GitHub Desktop.
creating cooccurrence matrix on Python using scipy.sparse.coo_matrix
def create_cooccurrence_matrix(filename,tokenizer,window_size):
vocabulary={}
data=[]
row=[]
col=[]
for sentence in codecs.open(filename,"r","utf-8"):
sentence=sentence.strip()
tokens=[token for token in tokenizer(sentence) if token!=u""]
for pos,token in enumerate(tokens):
i=vocabulary.setdefault(token,len(vocabulary))
start=max(0,pos-window_size)
end=min(len(tokens),pos+window_size+1)
for pos2 in xrange(start,end):
if pos2==pos:
continue
j=vocabulary.setdefault(tokens[pos2],len(vocabulary))
data.append(1.); row.append(i); col.append(j);
cooccurrence_matrix=sparse.coo_matrix((data,(row,col)))
return vocabulary,cooccurrence_matrix
@AidanaKaripbayeva
Copy link
Copy Markdown

Hi!
Could you,please, show how you use this function in code?
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment