Skip to content

Instantly share code, notes, and snippets.

@buhrmann
Last active April 28, 2023 09:19
Show Gist options
  • Save buhrmann/8532d00f3e2c2599f1ad17caa5f5c573 to your computer and use it in GitHub Desktop.
Save buhrmann/8532d00f3e2c2599f1ad17caa5f5c573 to your computer and use it in GitHub Desktop.
Basic dimensionally-reduced text embedding in Graphext
fetch_openreview(ds.conference) => (ds)
embed_text_with_model(ds.abstract, {
"collection": "SBERT",
"name": "paraphrase-multilingual-MiniLM-L12-v2"
}) => (ds.embedding)
embed_dataset(ds[["embedding"]], {"n_components": 10}) -> (ds.embedding_10d)
cluster_embeddings(ds.embedding_10d) -> (ds.cluster)
layout_dataset(ds[["embedding"]]) -> (ds.x, ds.y)
make_constant(ds.abstract, {"value": "en", "out_type": "category"}) => (ds.lang)
extract_ngrams(ds.abstract, ds.lang) => (ds.ngrams)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment