patrickdrouin’s gists

patrickdrouin / crawl_news.py

Created November 14, 2022 17:08

Code d'aspiration de sites de nouvelles

	import newspaper
	from newspaper import Config
	from newspaper import Article

	USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'

	config = Config()
	config.browser_user_agent = USER_AGENT
	config.request_timeout = 10

patrickdrouin / gensim_word2vec_procrustes_align.py

Created March 19, 2019 20:41 — forked from tangert/gensim_word2vec_procrustes_align.py

Code for aligning two or more word2vec models using Procrustes matrix alignment. Code originally ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <[email protected]>.

	def align_gensim_models(models, words=None):
	"""
	Returns the aligned/intersected models from a list of gensim word2vec models.
	Generalized from original two-way intersection as seen above.

	Also updated to work with the most recent version of gensim
	Requires reduce from functools

	In order to run this, make sure you run 'model.init_sims()' for each model before you input them for alignment.

patrickdrouin / gensim_word2vec_make_semantic_network.py

Created March 19, 2019 20:40 — forked from quadrismegistus/gensim_word2vec_make_semantic_network.py

Code to make a network out of the shortest N cosine-distances (or, equivalently, the strongest N associations) between a set of words in a gensim word2vec model.

	"""
	Code to make a network out of the shortest N cosine-distances (or, equivalently, the strongest N associations)
	between a set of words in a gensim word2vec model.

	To use:
	Set the filenames for the word2vec model.
	Set `my_words` to be a list of your own choosing.
	Set `num_top_dists` to be a number or a factor of the length of `my_words.`
	Choose between the two methods below to produce distances, and comment-out the other one.
	"""

patrickdrouin / gensim_word2vec_measure_semantic_shift_by_local_neighborhood.py

Created March 19, 2019 20:40 — forked from quadrismegistus/gensim_word2vec_measure_semantic_shift_by_local_neighborhood.py

This function measures the amount of semantic shift of a given word between two gensim word2vec models. It is a basic implementation of William Hamilton (@williamleif) et al's measure of semantic change proposed in their paper "Cultural Shift or Linguistic Drift?" (https://arxiv.org/abs/1606.02821), which they call the "local neighborhood measure."

	def measure_semantic_shift_by_neighborhood(model1,model2,word,k=25,verbose=False):
	"""
	Basic implementation of William Hamilton (@williamleif) et al's measure of semantic change
	proposed in their paper "Cultural Shift or Linguistic Drift?" (https://arxiv.org/abs/1606.02821),
	which they call the "local neighborhood measure." They find this measure better suited to understand
	the semantic change of nouns owing to "cultural shift," or changes in meaning "local" to that word,
	rather than global changes in language ("linguistic drift") use that are better suited to a
	Procrustes-alignment method (also described in the same paper.)

	Arguments are: