Created
January 18, 2017 16:24
-
-
Save piskvorky/84e6a910b28cf8cbbed6e1618499a280 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(st)[kofola3@kofola3:~/workspace/scaletext] (scaletext2)$ python -m scaletext.scripts.load_tab_separated_data --es-index wiki1k ./enwiki-1k-articles.txt | |
2017-01-19 01:20:41,713 : MainProcess : INFO : running /Volumes/work/workspace/scaletext/scaletext/scripts/load_tab_separated_data.py --es-index wiki1k ./enwiki-1k-articles.txt | |
2017-01-19 01:20:43,165 : MainProcess : INFO : 100 documents loaded; last: Art | |
2017-01-19 01:20:44,371 : MainProcess : INFO : 200 documents loaded; last: Albert Camus | |
2017-01-19 01:20:45,525 : MainProcess : INFO : 300 documents loaded; last: Atomic | |
2017-01-19 01:20:46,575 : MainProcess : INFO : 400 documents loaded; last: Dasyproctidae | |
2017-01-19 01:20:47,443 : MainProcess : INFO : 500 documents loaded; last: Afonso de Albuquerque | |
2017-01-19 01:20:48,176 : MainProcess : INFO : 600 documents loaded; last: Anacharsis | |
2017-01-19 01:20:49,008 : MainProcess : INFO : 700 documents loaded; last: Annealing | |
2017-01-19 01:20:50,112 : MainProcess : INFO : 800 documents loaded; last: Abijah | |
2017-01-19 01:20:51,214 : MainProcess : INFO : 900 documents loaded; last: Apple II | |
2017-01-19 01:20:52,175 : MainProcess : INFO : 1000 documents loaded; last: Abae | |
2017-01-19 01:20:52,178 : MainProcess : INFO : reindexing | |
2017-01-19 01:20:52,697 : MainProcess : INFO : adding document #0 to Dictionary(0 unique tokens: []) | |
2017-01-19 01:20:55,049 : MainProcess : INFO : adding document #10000 to Dictionary(216021 unique tokens: [u'Vall,', u'biennials', u'considered.', u'Rietveld.', u'considered,']...) | |
2017-01-19 01:20:56,718 : MainProcess : INFO : built Dictionary(320071 unique tokens: [u'Vall,', u'biennials', u"''Romam'',", u'considered.', u"Al-Arabi''"]...) from 17421 documents (total 3357280 corpus positions) | |
2017-01-19 01:20:57,318 : MainProcess : INFO : discarding 308490 tokens: [(u'(AMPAS)', 2), (u'Direction', 8), (u'to', 5343), (u'nominees.', 3), (u'85th', 1), (u'decorator(s).', 1), (u"'''Best", 1), (u"category's", 1), (u'of', 7135), (u"Director's", 2)]... | |
2017-01-19 01:20:57,318 : MainProcess : INFO : keeping 11581 tokens which were in no less than 20 and no more than 5226 (=30.0%) documents | |
2017-01-19 01:20:57,442 : MainProcess : INFO : resulting dictionary: Dictionary(11581 unique tokens: [u'writings', u'yellow', u'four', u'prefix', u'Andreas']...) | |
2017-01-19 01:20:57,859 : MainProcess : INFO : using serial LSI version on this node | |
2017-01-19 01:20:57,859 : MainProcess : INFO : updating model with new documents | |
2017-01-19 01:21:02,301 : MainProcess : INFO : preparing a new chunk of documents | |
2017-01-19 01:21:02,583 : MainProcess : INFO : using 100 extra samples and 2 power iterations | |
2017-01-19 01:21:02,586 : MainProcess : INFO : 1st phase: constructing (11581, 200) action matrix | |
2017-01-19 01:21:02,938 : MainProcess : INFO : orthonormalizing (11581, 200) action matrix | |
2017-01-19 01:21:04,379 : MainProcess : INFO : 2nd phase: running dense svd on (200, 17421) matrix | |
2017-01-19 01:21:04,929 : MainProcess : INFO : computing the final decomposition | |
2017-01-19 01:21:04,929 : MainProcess : INFO : keeping 100 factors (discarding 10.624% of energy spectrum) | |
2017-01-19 01:21:04,956 : MainProcess : INFO : processed documents up to #17421 | |
2017-01-19 01:21:04,962 : MainProcess : INFO : topic #0(31.688): -1.000*"Introduction" + -0.003*"*" + -0.000*"ISBN" + -0.000*"''The" + -0.000*"by" + -0.000*"–" + -0.000*"at" + -0.000*"on" + -0.000*"**" + -0.000*"The" | |
2017-01-19 01:21:04,962 : MainProcess : INFO : topic #1(28.476): -0.998*"References" + -0.063*"*" + -0.009*"External" + -0.009*"links" + -0.003*"–" + -0.002*"notes" + -0.002*"List" + -0.002*"reading" + -0.002*"sources" + -0.002*"ISBN" | |
2017-01-19 01:21:04,964 : MainProcess : INFO : topic #2(27.415): -0.644*"External" + -0.628*"links" + -0.430*"*" + 0.039*"References" + -0.021*"–" + -0.015*"List" + -0.013*"See" + -0.013*"The" + -0.013*"ISBN" + -0.013*"by" | |
2017-01-19 01:21:04,965 : MainProcess : INFO : topic #3(26.828): -0.885*"*" + 0.314*"External" + 0.304*"links" + 0.052*"References" + -0.045*"–" + -0.041*"See" + -0.032*"List" + -0.029*"The" + -0.029*"by" + -0.029*"ISBN" | |
2017-01-19 01:21:04,966 : MainProcess : INFO : topic #4(25.454): 0.888*"See" + 0.455*"also" + -0.055*"*" + 0.009*"is" + 0.008*"was" + 0.007*"as" + 0.006*"that" + 0.006*"his" + 0.005*"The" + 0.005*"with" | |
(st)[kofola3@kofola3:~/workspace/scaletext] (scaletext2)$ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment