One idea.md

An idea that I proved unable to express in the number of characters on Twitter:

Train two word2vec models on the same corpus with 100 dimensions apiece; one with window size 5, and one with window size 15 (say).

Now you have 2 100-dimensional vector spaces with the same words in each.

That's the same as 1 200-dimensional vector space: you just append each of the vectors to each other.

That vector space has all the information from each of the original models in it: you can just use linear algebra to flatten it out along either of the original 100 degree vectors.

But now you also have ability to define lines between the two spaces. Some words will be close in the smaller window ("syntactically", roughly) and some close in the larger one ("semantically"). The size of that difference--which should be extractable, somehow--might make it possible to estimate what the relative distances or positions of vectors in a third window size would be; even a window size of 25 (larger than anything we modeled) or 2 (smaller than anything we modeled).

Maybe it could even be trained in a way that there would be a linear regularity to this estimation: so that by starting from a larger dimensionality, you could "zoom" window sizes in the same model through projecting into smaller spaces.

bmschmidt/One idea.md