Skip to content

Instantly share code, notes, and snippets.

@buhrmann
buhrmann / embed_texts.hs
Last active April 28, 2023 09:19
Basic dimensionally-reduced text embedding in Graphext
fetch_openreview(ds.conference) => (ds)
embed_text_with_model(ds.abstract, {
"collection": "SBERT",
"name": "paraphrase-multilingual-MiniLM-L12-v2"
}) => (ds.embedding)
embed_dataset(ds[["embedding"]], {"n_components": 10}) -> (ds.embedding_10d)
cluster_embeddings(ds.embedding_10d) -> (ds.cluster)
layout_dataset(ds[["embedding"]]) -> (ds.x, ds.y)
@buhrmann
buhrmann / venn.hs
Created May 3, 2023 11:34
Venn diagram layout
# Assuming a (long) dataset of keywords with associated author information,
# with at least one row per (keyword x author), this creates one cluster of
# keywords for each combination of authors. E.g. if there are 3 different
# authors (A, B, C), there will be up to 8 clusters: A, B, C, A ∧ B, A ∧ C,
# A ∧ B ∧ C. These are essentially all the subgroups in a Venn diagram of
# A, B and C.
# If the dataset is already aggregate (one row per keywords and all authors
# having used the keyword in a corresponding list (multivalued category), this
# step isn't necessary