Skip to content

Instantly share code, notes, and snippets.

@patrickdrouin
patrickdrouin / crawl_news.py
Created November 14, 2022 17:08
Code d'aspiration de sites de nouvelles
import newspaper
from newspaper import Config
from newspaper import Article
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = Config()
config.browser_user_agent = USER_AGENT
config.request_timeout = 10
@patrickdrouin
patrickdrouin / gensim_word2vec_procrustes_align.py
Created March 19, 2019 20:41 — forked from tangert/gensim_word2vec_procrustes_align.py
Code for aligning two or more word2vec models using Procrustes matrix alignment. Code originally ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <[email protected]>.
def align_gensim_models(models, words=None):
"""
Returns the aligned/intersected models from a list of gensim word2vec models.
Generalized from original two-way intersection as seen above.
Also updated to work with the most recent version of gensim
Requires reduce from functools
In order to run this, make sure you run 'model.init_sims()' for each model before you input them for alignment.
@patrickdrouin
patrickdrouin / gensim_word2vec_make_semantic_network.py
Created March 19, 2019 20:40 — forked from quadrismegistus/gensim_word2vec_make_semantic_network.py
Code to make a network out of the shortest N cosine-distances (or, equivalently, the strongest N associations) between a set of words in a gensim word2vec model.
"""
Code to make a network out of the shortest N cosine-distances (or, equivalently, the strongest N associations)
between a set of words in a gensim word2vec model.
To use:
Set the filenames for the word2vec model.
Set `my_words` to be a list of your own choosing.
Set `num_top_dists` to be a number or a factor of the length of `my_words.`
Choose between the two methods below to produce distances, and comment-out the other one.
"""