Skip to content

Instantly share code, notes, and snippets.

View Ladsgroup's full-sized avatar

Amir Sarabadani Ladsgroup

View GitHub Profile
@Ladsgroup
Ladsgroup / recent_batch.patch
Created February 26, 2018 20:14
wikidata and enwiktionary batch
- ################################### Wikidata ##################################
? ------ --
+ ############################# Wikidata ################################
-
- # wikidatawiki.balanced_revisions.20k_2015.json is check into the repo
-
- datasets/wikidatawiki.autolabeled_revisions.20k_2015.json: \
- datasets/wikidatawiki.balanced_revisions.20k_2015.json
- cat $< | \
- ./utility autolabel --host=https://wikidata.org \
@Ladsgroup
Ladsgroup / huwiki.patch
Created February 26, 2018 19:26
huwiki templating
- ############################### Hungarian Wikipedia ###########################
? --
+ ############################# Hungarian Wikipedia ################################
? +++++
datasets/huwiki.sampled_revisions.40k_2016.json:
wget -qO- http://quarry.wmflabs.org/run/79645/output/0/json-lines?download=true > $@
datasets/huwiki.autolabeled_revisions.40k_2016.json: \
datasets/huwiki.sampled_revisions.40k_2016.json
@Ladsgroup
Ladsgroup / two_wikis_batch.patch
Created February 21, 2018 22:58
Edge_cases_part_I
- ############################# Norwegian Wikipedia #############################
+ ############################# Norwegian Wikipedia ################################
? +++
datasets/nowiki.sampled_revisions.100k_2015.json:
wget -qO- https://quarry.wmflabs.org/run/67250/output/0/json-lines?download=true > $@
datasets/nowiki.autolabeled_revisions.100k_2015.json: \
datasets/nowiki.sampled_revisions.100k_2015.json
cat $< | \
@Ladsgroup
Ladsgroup / fawiki.patch
Created February 7, 2018 14:38
Template, edge cases
############################# Persian Wikipedia ################################
+
+ datasets/fawiki.sampled_revisions.2.20k_2015.json:
+ wget -qO- http://quarry.wmflabs.org/run/59580/output/0/json-lines?download=true > $@
+
+ datasets/fawiki.autolabeled_revisions.2.20k_2015.json: \
+ datasets/fawiki.sampled_revisions.2.20k_2015.json
+ cat $< | \
+ ./utility autolabel --host=https://fa.wikipedia.org \
+ --trusted-groups=sysop,oversight,bot,rollbacker,checkuser,abusefilter,bureaucrat,flow-bot \
@Ladsgroup
Ladsgroup / parser.py
Created January 27, 2018 00:29
Clickstream_parser
# License: MIT
import gzip
def search(name, i):
result = []
with gzip.open('clickstream-enwiki-2017-12.tsv.gz','rb') as f:
for line in f:
line = line.decode('utf-8').replace('\n', '')
if line.split('\t')[i] == name:
@Ladsgroup
Ladsgroup / New batch.patch
Last active January 31, 2018 17:09
Diff for templating
amsa@C235:~/editquality$ python differ.py "Japanese Wikipedia"
- ########################### Japanese Wikipedia ################################
+ ############################# Japanese Wikipedia ################################
? ++
-
# From https://quarry.wmflabs.org/query/9927
datasets/jawiki.sampled_revisions.40k_2016.json:
wget -qO- https://quarry.wmflabs.org/run/89016/output/0/json-lines?download=true > $@
@Ladsgroup
Ladsgroup / texter.py
Created December 1, 2017 11:38
database cleaner
import pymysql.cursors
import json
connection = pymysql.connect(host='localhost',
user='wikiuser',
password='secret service',
db='wikidb',
cursorclass=pymysql.cursors.DictCursor)
# range(20881, 1, -1)
for i in [142]:
@Ladsgroup
Ladsgroup / coc2.py
Last active November 26, 2017 15:08
Cochrane bot
# License: MIT
import pywikibot
import re
import urllib2
from pywikibot import pagegenerators
site = pywikibot.Site('en')
generator = pagegenerators.SearchPageGenerator('insource:/\| *journal *= *.+Cochrane/', site=site, namespaces=[0])
gen = pagegenerators.PreloadingGenerator(generator)
@Ladsgroup
Ladsgroup / Most visited articles of Armenian Wikipedia
Created September 4, 2017 14:12
Most visited articles from Armenia
https://hy.wikipedia.org/wiki/Գլխավոր_էջ 711702
https://hy.wikipedia.org/wiki/Սպասարկող:Որոնել 476598
https://hy.wikipedia.org/wiki/- 223200
https://hy.wikipedia.org/wiki/Սպասարկող:Մասնակցիմուտք 131273
https://hy.wikipedia.org/wiki/Սպասարկող:Վերջինփոփոխությունները 109813
https://hy.wikipedia.org/wiki/Հայաստան 96958
https://hy.wikipedia.org/wiki/Սպասարկող:CreateAccount 80968
https://hy.wikipedia.org/wiki/Սպասարկող:Book 69286
https://hy.wikipedia.org/wiki/Հովհաննես_Թումանյան 66342
https://hy.wikipedia.org/wiki/Երևան 52996
@Ladsgroup
Ladsgroup / nick_fixes.py
Created September 3, 2017 20:15
Nick fixes
# License: MIT
import pywikibot
import sys
with open('nick_fixes.txt', 'r') as f:
cases = f.read().split('\n')
sites = {'wikidata': pywikibot.Site('wikidata', 'wikidata')}
ok = True
fixes = [
['== Share your experience and feedback as a Wikimedian in this global survey ==', ['<ref>']],