Skip to content

Instantly share code, notes, and snippets.

View halfak's full-sized avatar

Aaron Halfaker halfak

View GitHub Profile
@halfak
halfak / example_revscoring_profile.md
Created November 15, 2019 15:52
Example profile from running extract on our article quality model dataset for Enwiki

Extracting 36 values:

  • feature.wikitext.revision.chars
  • feature.wikitext.revision.content_chars
  • feature.wikitext.revision.ref_tags
  • feature.(wikitext.revision.ref_tags / max(wikitext.revision.content_chars, 1))
  • feature.wikitext.revision.wikilinks
  • feature.(wikitext.revision.wikilinks / max(wikitext.revision.content_chars, 1))
  • feature.wikitext.revision.external_links
  • feature.(wikitext.revision.external_links / max(wikitext.revision.content_chars, 1))
  • feature.wikitext.revision.headings_by_level(2)
revids_param = "|".join(str(v) for v in range(100, 200))
make_request(
ores_url,
"/v3/scores/testwiki/?revids={0}".format(revids_param),
is_json=True,
http_code=400,
equal_to={"error": {
"code": "bad request",
"message": "Too many values for 'revids' parameter. Max of 50."
}})
>>> from ores import api
>>>
>>> ores_session = api.Session("https://ores.wikimedia.org", "Class project <[email protected]>")
>>>
>>> results = ores_session.score("enwiki", ["articlequality"], [1234, 5678, 91011])
>>>
>>> for score in results:
... print(score)
...
{'articlequality': {'score': {'prediction': 'B', 'probability': {'GA': 0.005565225912988614, 'Stub': 0.285072978841463, 'C': 0.1237249061020009, 'B': 0.2910788689339172, 'Start': 0.2859984921969326, 'FA': 0.008559528012697881}}}}
>>> from itertools import islice
>>>
>>> import mwapi
>>>
>>> my_agent = 'gap finder script'
>>> session = mwapi.Session('https://wikidata.org',
... formatversion=2,
... user_agent=my_agent)
>>>
>>>
def query_revisions_by_titles(titles, batch=50, **params):
titles_iter = iter(titles)
while True:
batch_titles = list(islice(titles_iter, 0, batch))
if len(batch_titles) == 0:
break
else:
doc = session.get(action='query', prop='revisions',
titles=batch_titles, **params)
$ python
Python 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> [i for i, c in enumerate("hello") if c == "e"]
[1]
>>> [i for i, c in enumerate("hello") if c == "l"]
[2, 3]
$ python
Python 3.5.1+ (default, Mar 30 2016, 22:46:26)
[GCC 5.3.1 20160330] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from revscoring import Model
>>> m = Model.load(open("models/enwiki.damaging.gradient_boosting.model"))
>>> m.estimator.feature_importances_
array([8.39962459e-03, 7.93533540e-03, 3.04496444e-08, 2.25381150e-02,
2.08058610e-02, 2.26880141e-02, 1.87132900e-02, 1.60180859e-02,
2.23545834e-02, 2.14488512e-02, 1.90494208e-02, 2.54534679e-02,
/**
* Construct a thread pool using nWorkers.
*
* @constructor
* @param {int} [nWorkers] The number of worker threads
*/
var ThreadPool = function(nWorkers) {
this.activeWorkers = 0;
this.nWorkers = nWorkers
"schemas": {
"contentquality": {
"type": "object",
"properties": {
"contentquality": {
"type": "string"
}
},
"required": ["contentquality"]
},
@halfak
halfak / hard_to_parse.wikitext
Created October 11, 2018 14:59
Demonstrate signal timeout

Philip P. Barbour
Presiding officer
Philip P. Barbour Presiding officer
The Virginia Constitutional Convention of 1829–1830 was a constitutional convention for the state of Virginia, held in Richmond from October 5, 1829 to January 15, 1830. == Background and composition == Almost immediately, the Constitution of 1776 was recognized as flawed both for its restriction of the suffrage by property requirements, and for its malapportionment favoring the smaller eastern counties. Between 1801 and 1813, petitioners called on the Assembly to initiate a constitutional convention ten times. The House of Delegates passed a bill twice, but the conservative eastern planter majority in the Virginia Senate killed both measures. Continuing growth in the western parts of the state led to another fifteen years of agitation. Several counties in the Eastern Shore, northern Piedmont and western counties began opening polls for direct expression fr