This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ pip install mwcites | |
<SNIP> | |
Cleaning up... | |
$ python | |
Python 3.4.1 (default, May 26 2014, 01:12:52) | |
[GCC 4.8.1] on linux | |
Type "help", "copyright", "credits" or "license" for more information. | |
>>> from mwcites.extractors import doi | |
>>> list(doi.extract("Foobar 10.1000/282lasnd<foo>[bar].24 hats pants 10.0023/banana")) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> select LEFT(rev_timestamp, 8), COUNT(*) FROM revision WHERE rev_timestamp > "20150201" GROUP BY 1; | |
+------------------------+----------+ | |
| LEFT(rev_timestamp, 8) | COUNT(*) | | |
+------------------------+----------+ | |
| 20150201 | 161300 | | |
| 20150202 | 144059 | | |
| 20150203 | 143067 | | |
| 20150204 | 146833 | | |
| 20150205 | 139978 | | |
| 20150206 | 140813 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(3.4)[halfak@ores-test: ~/projects/ores] | |
$ python | |
Python 3.4.0 (default, Apr 11 2014, 13:05:11) | |
[GCC 4.8.2] on linux | |
Type "help", "copyright", "credits" or "license" for more information. | |
>>> from revscoring.scorers import MLScorerModel | |
>>> MLScorerModel.load(open("models/enwiki.reverted.linear_svc.model", 'rb')) | |
<revscoring.scorers.svc.LinearSVCModel object at 0x7fea7257c4a8> | |
>>> MLScorerModel.load(open("models/ptwiki.reverted.linear_svc.model", 'rb')) | |
<revscoring.scorers.svc.LinearSVCModel object at 0x7fea5f942f98> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
select rev_id, COUNT(*) FROM mwcites_enwiki_20150112 GROUP BY 1 ORDER BY COUNT(*) DESC LIMIT 10; | |
+-----------+----------+ | |
| rev_id | COUNT(*) | | |
+-----------+----------+ | |
| 208356562 | 375 | | |
| 597498735 | 352 | | |
| 557642221 | 242 | | |
| 209522143 | 231 | | |
| 303096827 | 230 | | |
| 303100944 | 225 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[halfak@graphite: ~/projects/wikimetrics] | |
$ grep -r httplib2 . | |
./requirements.txt:httplib2==0.9 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- AI is important to quality control | |
- Story of huggle/cluebot in enwiki | |
- AI is hard | |
- Discussion of skills needed | |
- AI as service --> Ecosystem | |
- Discuss all the tools that do / might use quality scores |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import cProfile as profile | |
import pprint | |
import re | |
import time | |
from hashlib import sha1 | |
from mw import api | |
from more_itertools import peekable |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jQuery111208480086917140102_1428331100186({ | |
"task": { | |
"campaign_id": 1, | |
"data": { | |
"rev_id": 101 | |
}, | |
"id": 1, | |
"labels": [ | |
{ | |
"data": { |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ python | |
Python 3.4.0 (default, Apr 11 2014, 13:05:11) | |
[GCC 4.8.2] on linux | |
Type "help", "copyright", "credits" or "license" for more information. | |
>>> from revscoring.features import parent_revision | |
>>> parent_revision.markup_chars.returns | |
<class 'int'> | |
>>> from revscoring.dependent import draw | |
>>> draw(parent_revision.markup_chars) | |
- <parent_revision.markup_chars> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ python | |
Python 3.4.0 (default, Apr 11 2014, 13:05:11) | |
[GCC 4.8.2] on linux | |
Type "help", "copyright", "credits" or "license" for more information. | |
>>> from revscoring.datasources import diff | |
>>> from revscoring.extractors import APIExtractor | |
>>> from mw.api import Session | |
>>> extractor = APIExtractor(Session("https://en.wikipedia.org/w/api.php")) | |
Sending requests with default User-Agent. Set 'user_agent' on api.Session to quiet this message. | |
>>> list(extractor.extract(4567890, [diff.added_words]))[0] |