Skip to content

Instantly share code, notes, and snippets.

@piskvorky
Created January 18, 2017 16:24
Show Gist options
  • Save piskvorky/84e6a910b28cf8cbbed6e1618499a280 to your computer and use it in GitHub Desktop.
Save piskvorky/84e6a910b28cf8cbbed6e1618499a280 to your computer and use it in GitHub Desktop.
(st)[kofola3@kofola3:~/workspace/scaletext] (scaletext2)$ python -m scaletext.scripts.load_tab_separated_data --es-index wiki1k ./enwiki-1k-articles.txt
2017-01-19 01:20:41,713 : MainProcess : INFO : running /Volumes/work/workspace/scaletext/scaletext/scripts/load_tab_separated_data.py --es-index wiki1k ./enwiki-1k-articles.txt
2017-01-19 01:20:43,165 : MainProcess : INFO : 100 documents loaded; last: Art
2017-01-19 01:20:44,371 : MainProcess : INFO : 200 documents loaded; last: Albert Camus
2017-01-19 01:20:45,525 : MainProcess : INFO : 300 documents loaded; last: Atomic
2017-01-19 01:20:46,575 : MainProcess : INFO : 400 documents loaded; last: Dasyproctidae
2017-01-19 01:20:47,443 : MainProcess : INFO : 500 documents loaded; last: Afonso de Albuquerque
2017-01-19 01:20:48,176 : MainProcess : INFO : 600 documents loaded; last: Anacharsis
2017-01-19 01:20:49,008 : MainProcess : INFO : 700 documents loaded; last: Annealing
2017-01-19 01:20:50,112 : MainProcess : INFO : 800 documents loaded; last: Abijah
2017-01-19 01:20:51,214 : MainProcess : INFO : 900 documents loaded; last: Apple II
2017-01-19 01:20:52,175 : MainProcess : INFO : 1000 documents loaded; last: Abae
2017-01-19 01:20:52,178 : MainProcess : INFO : reindexing
2017-01-19 01:20:52,697 : MainProcess : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2017-01-19 01:20:55,049 : MainProcess : INFO : adding document #10000 to Dictionary(216021 unique tokens: [u'Vall,', u'biennials', u'considered.', u'Rietveld.', u'considered,']...)
2017-01-19 01:20:56,718 : MainProcess : INFO : built Dictionary(320071 unique tokens: [u'Vall,', u'biennials', u"''Romam'',", u'considered.', u"Al-Arabi''"]...) from 17421 documents (total 3357280 corpus positions)
2017-01-19 01:20:57,318 : MainProcess : INFO : discarding 308490 tokens: [(u'(AMPAS)', 2), (u'Direction', 8), (u'to', 5343), (u'nominees.', 3), (u'85th', 1), (u'decorator(s).', 1), (u"'''Best", 1), (u"category's", 1), (u'of', 7135), (u"Director's", 2)]...
2017-01-19 01:20:57,318 : MainProcess : INFO : keeping 11581 tokens which were in no less than 20 and no more than 5226 (=30.0%) documents
2017-01-19 01:20:57,442 : MainProcess : INFO : resulting dictionary: Dictionary(11581 unique tokens: [u'writings', u'yellow', u'four', u'prefix', u'Andreas']...)
2017-01-19 01:20:57,859 : MainProcess : INFO : using serial LSI version on this node
2017-01-19 01:20:57,859 : MainProcess : INFO : updating model with new documents
2017-01-19 01:21:02,301 : MainProcess : INFO : preparing a new chunk of documents
2017-01-19 01:21:02,583 : MainProcess : INFO : using 100 extra samples and 2 power iterations
2017-01-19 01:21:02,586 : MainProcess : INFO : 1st phase: constructing (11581, 200) action matrix
2017-01-19 01:21:02,938 : MainProcess : INFO : orthonormalizing (11581, 200) action matrix
2017-01-19 01:21:04,379 : MainProcess : INFO : 2nd phase: running dense svd on (200, 17421) matrix
2017-01-19 01:21:04,929 : MainProcess : INFO : computing the final decomposition
2017-01-19 01:21:04,929 : MainProcess : INFO : keeping 100 factors (discarding 10.624% of energy spectrum)
2017-01-19 01:21:04,956 : MainProcess : INFO : processed documents up to #17421
2017-01-19 01:21:04,962 : MainProcess : INFO : topic #0(31.688): -1.000*"Introduction" + -0.003*"*" + -0.000*"ISBN" + -0.000*"''The" + -0.000*"by" + -0.000*"–" + -0.000*"at" + -0.000*"on" + -0.000*"**" + -0.000*"The"
2017-01-19 01:21:04,962 : MainProcess : INFO : topic #1(28.476): -0.998*"References" + -0.063*"*" + -0.009*"External" + -0.009*"links" + -0.003*"–" + -0.002*"notes" + -0.002*"List" + -0.002*"reading" + -0.002*"sources" + -0.002*"ISBN"
2017-01-19 01:21:04,964 : MainProcess : INFO : topic #2(27.415): -0.644*"External" + -0.628*"links" + -0.430*"*" + 0.039*"References" + -0.021*"–" + -0.015*"List" + -0.013*"See" + -0.013*"The" + -0.013*"ISBN" + -0.013*"by"
2017-01-19 01:21:04,965 : MainProcess : INFO : topic #3(26.828): -0.885*"*" + 0.314*"External" + 0.304*"links" + 0.052*"References" + -0.045*"–" + -0.041*"See" + -0.032*"List" + -0.029*"The" + -0.029*"by" + -0.029*"ISBN"
2017-01-19 01:21:04,966 : MainProcess : INFO : topic #4(25.454): 0.888*"See" + 0.455*"also" + -0.055*"*" + 0.009*"is" + 0.008*"was" + 0.007*"as" + 0.006*"that" + 0.006*"his" + 0.005*"The" + 0.005*"with"
(st)[kofola3@kofola3:~/workspace/scaletext] (scaletext2)$
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment