Thomas Buhrmann buhrmann

20 followers · 0 following

@graphext
Madrid, Spain
http://es.linkedin.com/in/thomasbuhrmann
https://orcid.org/0000-0002-8617-9608
@tom_gxt

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

buhrmann / embed_texts.hs

Last active April 28, 2023 09:19

Basic dimensionally-reduced text embedding in Graphext

	fetch_openreview(ds.conference) => (ds)

	embed_text_with_model(ds.abstract, {
	"collection": "SBERT",
	"name": "paraphrase-multilingual-MiniLM-L12-v2"
	}) => (ds.embedding)

	embed_dataset(ds[["embedding"]], {"n_components": 10}) -> (ds.embedding_10d)
	cluster_embeddings(ds.embedding_10d) -> (ds.cluster)
	layout_dataset(ds[["embedding"]]) -> (ds.x, ds.y)

buhrmann / venn.hs

Created May 3, 2023 11:34

Venn diagram layout

	# Assuming a (long) dataset of keywords with associated author information,
	# with at least one row per (keyword x author), this creates one cluster of
	# keywords for each combination of authors. E.g. if there are 3 different
	# authors (A, B, C), there will be up to 8 clusters: A, B, C, A ∧ B, A ∧ C,
	# A ∧ B ∧ C. These are essentially all the subgroups in a Venn diagram of
	# A, B and C.

	# If the dataset is already aggregate (one row per keywords and all authors
	# having used the keyword in a corresponding list (multivalued category), this
	# step isn't necessary