- Culture
- Biography
- Biography*
- Women
- Food and drink
- Internet culture
- Linguistics
- Literature
- Media
- Biography
- Books
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
.diff{border:0;border-spacing:4px;margin:0;width:100%; table-layout:fixed}.diff td{padding:0.33em 0.5em}.diff td.diff-marker{ padding:0.25em}.diff col.diff-marker{width:2%}.diff .diff-content{width:48%}.diff td div{ word-wrap:break-word}.diff-title{vertical-align:top}.diff-notice,.diff-multi,.diff-otitle,.diff-ntitle{text-align:center}.diff-lineno{font-weight:bold}td.diff-marker{text-align:right;font-weight:bold;font-size:1.25em;line-height:1.2}.mw-diff-inline-deleted del,.mw-diff-inline-added ins,.mw-diff-inline-changed ins,.mw-diff-inline-changed del{display:inline-block;text-decoration:none}.diff-addedline,.diff-deletedline,.diff-context{font-size:88%;line-height:1.6;vertical-align:top;white-space:pre-wrap;border-style:solid;border-width:1px 1px 1px 4px;border-radius:0.33em}.mw-diff-inline-added ins,.mw-diff-inline-changed ins{background:#a3d3ff}.diff-addedline{border-color:#a3d3ff}.mw-diff-inline-deleted del,.mw-diff-inline-changed del{background:#ffe49c}.diff-deletedline{border-color:#ffe49c}.diff-conte |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ python | |
Python 3.5.1+ (default, Mar 30 2016, 22:46:26) | |
[GCC 5.3.1 20160330] on linux | |
Type "help", "copyright", "credits" or "license" for more information. | |
>>> from revscoring.dependencies import solve | |
>>> from revscoring.languages import english | |
>>> from revscoring.datasources import revision_oriented as ro | |
>>> solve(english.idioms.revision.datasources.matches, cache={ro.revision.text: "This is some text. I don't want to beat around the bush."}) | |
['beat around the bush'] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ python get_thresholds.py arwiki | |
------------------------------------------- -------- --------- --------- ------ | |
label pop rate threshold precision recall | |
Culture.Biography.Biography* 0.123 0.338 0.7 0.975 | |
Culture.Biography.Women 0.015 0.617 0.5 0.661 | |
Culture.Food and drink 0.002 0.792 0.7 0.61 | |
Culture.Internet culture 0.004 0.818 0.7 0.702 | |
Culture.Linguistics 0.007 0.251 0.7 0.739 | |
Culture.Literature 0.016 0.707 0.7 0.636 | |
Culture.Media.Books 0.004 0.583 0.7 0.727 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ python | |
Python 3.5.3 (default, Sep 27 2018, 17:25:39) | |
[GCC 6.3.0 20170516] on linux | |
Type "help", "copyright", "credits" or "license" for more information. | |
>>> from mwxml import Dump | |
>>> import mwtypes.files | |
>>> d = Dump.from_file(mwtypes.files.reader("/mnt/data/xmldatadumps/public/eswiki/latest/eswiki-latest-pages-logging.xml.gz")) | |
>>> for l in d.log_items: | |
... print(l.type, l.action) | |
... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
w2v = aggregators.mean( | |
revision_text_vectors, | |
vector=True, | |
name="revision.text.google_news_vector_mean" | |
) | |
# Define pronoun features | |
# ... preamble to defining features | |
female_pronouns_count = aggregators.len(female_pronouns) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ python | |
Python 3.5.1+ (default, Mar 30 2016, 22:46:26) | |
[GCC 5.3.1 20160330] on linux | |
Type "help", "copyright", "credits" or "license" for more information. | |
>>> # This revision references https://en.wikipedia.org/wiki/Ann_Bishop_(biologist) | |
>>> rev_id = 931384270 | |
>>> from revscoring.extractors import api | |
>>> from revscoring.features import wikitext | |
>>> import mwapi | |
>>> extractor = api.Extractor(mwapi.Session("https://en.wikipedia.org")) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class FakeVectors(dict): | |
pass | |
test_vectors = FakeVectors({ | |
'a': [1] * 300, | |
'b': [1] * 300, | |
'c': [1] * 300}) | |
test_vectors.vector_size = 300 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ python | |
Python 3.5.1+ (default, Mar 30 2016, 22:46:26) | |
[GCC 5.3.1 20160330] on linux | |
Type "help", "copyright", "credits" or "license" for more information. | |
>>> from drafttopic.feature_lists.wordvectors import w2v | |
>>> from revscoring.dependencies import solve | |
>>> help(solve) | |
>>> from revscoring.languages import english | |
>>> english.stopwords.revision.datasources.non_stopwords |