cigrainger’s gists

cigrainger / gist:11373497

Created April 28, 2014 14:17

	with open("C:\Users\graingec\spillovers\\abstracts\\abstracts.txt",'rb') as f:
	for line in f:
	i = 0
	while i < 10:
	print(line)
	i = i+1

cigrainger / gist:11374165

Created April 28, 2014 14:41

	import sys
	from nltk.stem.wordnet import WordNetLemmatizer
	lmtzr = WordNetLemmatizer()
	import re, string
	pattern=re.compile(r'[^a-zA-Z ]')

	def clean(x):
	x = x.replace('<image>','')
	x = pattern.sub('',x.lower())
	x = x.replace('\r','')

cigrainger / gist:11385170

Created April 28, 2014 21:54

	Determining the novelty of patents using topic models
	===

	The novelty measure builds on work by Kaplan & Vakili (2013), who use topic models to find 'breakthrough technologies'.

	The novelty measure ($\lambda$) for each patent $p$ in each time period $y$ is determined by the sum of the novelty score ($\gamma$) in that time period for each topic $t$ over the cutoff score $c$. This is found by a simple algorithm:

	1. For each topic-period, find the sum of patents with a topic proportion over the threshold $c$ (where $\beta_{pt}$ is the proportion of topic $t$ in the distribution of topics over patent $p$): $$\theta_{ty}=\sum^{p}_{i=1}x_{i} \text{ where } x_{i} = \begin{cases} 1 & \text{if} & \beta_{pt} \ge c \\ 0 & \text{if} & \beta_{pt} \lt c \end{cases}$$

	3. To find the novelty score for each topic-period ($\gamma_{ty}$), find the period of the first period where $\theta_{ty}\ge 1$ ($y_{init}$) and set $\gamma_{ty}$ to 1, find the period of full diffusion ($y_{\text{max}[\theta_{t}]}$) and set $\gamma_{ty

cigrainger / gist:812919e2df197e134ee3

Created May 16, 2014 14:08

	import re, string, sys, nltk
	from nltk.stem.wordnet import WordNetLemmatizer

	lmtzr = WordNetLemmatizer()
	pattern=re.compile(r'[^a-zA-Z ]')

	def get_wordnet_pos(treebank_tag):
	if treebank_tag.startswith('J'):
	return wordnet.ADJ
	elif treebank_tag.startswith('V'):

cigrainger / gist:c59e73430b192c3a6415

Last active August 29, 2015 14:01

	<body style='text-transform:none;'>
	The Colombian Coffee Company:<br>
	We are a social enterprise ethically committed to supporting coffee-growing communities in Colombia through direct trade and single-origin coffee sales, art, and photography projects.<br>
	<br>
	The Job:<br>
	Upon understanding our ethical initiative you will
	Prepare premium coffee drinks
	Make sure that all runs smoothly at the bar
	Provide excellent customer service to all of the lovely festival-goers!
	<br>

cigrainger / gist:8eb4f4cb4fb7288a2ff4

Created May 16, 2014 17:07

	# Imports and housekeeping
	import logging
	logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s',
	level=logging.INFO)
	from gensim import corpora, models, similarities
	import numpy as np
	import matplotlib.pyplot as plt

	# Define KL functions
	def kl(p,q):

cigrainger / gist:639210ddae262bc6a402

Created May 16, 2014 17:49

	call(['export PYRO_SERIALIZERS_ACCEPTED=pickle',
	'export PYRO_SERIALIZER=pickle',
	'python -m Pyro4.naming -n 0.0.0.0 &',
	'python -m gensim.models.lda_worker &',
	'python -m gensim.models.lda_dispatcher &'])

cigrainger / gist:dac4cb8ff3821951f5e6

Last active August 29, 2015 14:01

	# Imports and housekeeping
	import logging
	logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s',
	level=logging.INFO)
	from gensim import corpora, models, similarities
	import numpy as np
	import matplotlib.pyplot as plt
	from subprocess import call

	# Initialise distributed workers

cigrainger / gist:7b03745e72241c69033a

Created May 23, 2014 15:05

	import re, string, sys, nltk
	from nltk.stem.wordnet import WordNetLemmatizer

	lmtzr = WordNetLemmatizer()
	pattern=re.compile(r'[^a-zA-Z ]')

	def get_wordnet_pos(treebank_tag):
	if treebank_tag.startswith('J'):
	return wordnet.ADJ
	elif treebank_tag.startswith('V'):

cigrainger / gist:5b05afedea540bcec8e9

Created May 27, 2014 12:14

	kl = []
	num = range(0,25000,10)
	for i in num:
	lda = models.ldamodel.LdaModel(corpus=my_corpus,
	id2word=dictionary,num_topics=i,distributed=True)
	#Topic-word matrix
	m1 = lda.expElogbeta
	U,s,V = np.linalg.svd(p)
	cm1 = s
	#Document-topic matrix

Christopher Grainger cigrainger