🎧

Alive and Awake

Hridoy Sankar Dutta hridaydutta123

🎧

Alive and Awake

Lecturer at Deakin University | Security, Cybercrime and Social networks | Alma mater University of Cambridge, IIIT-DELHI, NITDGP, IST-GU

115 followers · 111 following

Deakin University
India
https://hridaydutta123.github.io
@hridaydutta123
https://www.linkedin.com/in/hridoy-sankar-dutta-18448a63/?ppe=1

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

nzw0301 / something2vec.md

Last active March 5, 2025 11:06

*2vec papers

act2vec, trace2vec, log2vec, model2vec https://link.springer.com/chapter/10.1007/978-3-319-98648-7_18
apk2vec https://arxiv.org/abs/1809.05693
app2vec http://paul.rutgers.edu/~qma/research/ma_app2vec.pdf
ast2vec https://arxiv.org/abs/2103.11614
attribute2vec https://arxiv.org/abs/2004.01375
author2vec http://dl.acm.org/citation.cfm?id=2889382
baller2vec https://arxiv.org/abs/2102.03291
bb2vec https://arxiv.org/abs/1809.09621

mattb / gist:3888345

Created October 14, 2012 11:53

Some pointers for Natural Language Processing / Machine Learning

Here are the areas I've been researching, some things I've read and some open source packages...

Nearly all text processing starts by transforming text into vectors: http://en.wikipedia.org/wiki/Vector_space_model

Often it uses transforms such as TFIDF to normalise the data and control for outliers (words that are too frequent or too rare confuse the algorithms): http://en.wikipedia.org/wiki/Tf%E2%80%93idf

Collocations is a technique to detect when two or more words occur more commonly together than separately (e.g. "wishy-washy" in English) - I use this to group words into n-gram tokens because many NLP techniques consider each word as if it's independent of all the others in a document, ignoring order: http://matpalm.com/blog/2011/10/22/collocations_1/