Question: can parallel pre- and postprocessing speed up Gensim Doc2Vec?
- Spark: 349s
- Vanilla: 373s
(only one run, so not a very scientific comparison)
Run on a single machine with 16GB RAM and Intel i7-8550U CPU @ 1.80GHz
This gist contains the first three lines from the input file, which for this example has 200k lines and 457MB