This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- Amir's query to get all pages in WikiProject_Medicine, executed on analytics-store | |
select page_title | |
from (select tl_from, rd_from | |
from templatelinks | |
left join | |
redirect on rd_from = tl_from | |
and rd_title = 'WikiProject_Medicine' | |
and (rd_interwiki is null or rd_interwiki = '') | |
and rd_namespace = 10 | |
inner join |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ADD JAR /srv/deployment/analytics/refinery/artifacts/org/wikimedia/analytics/refinery/refinery-hive-0.0.35.jar; | |
CREATE TEMPORARY FUNCTION ua_parser as 'org.wikimedia.analytics.refinery.hive.UAParserUDF'; | |
select ua_parser('Wikimedia Bot'); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- ran this on the hadoop cluster (not available publicly) | |
select std(edits_per_page), | |
avg(edits_per_page), | |
std(days_from_first_to_last_edit), | |
avg(days_from_first_to_last_edit) | |
from (select rev_page pid, | |
(unix_timestamp(max(rev_timestamp),'yyyyMMddHHmmss') - unix_timestamp(min(rev_timestamp),'yyyyMMddHHmmss')) / 86400 days_from_first_to_last_edit, | |
count(*) edits_per_page | |
from mediawiki_revision | |
inner join |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- join oldTitle, oldNamespace, timestamp to title, namespace, end and vice versa, should match perfectly | |
select sum(if(log_title is null, 1, 0)) as history_without_log_line, | |
sum(if(page_title is null, 1, 0)) as log_line_without_history | |
from mforns.simplewiki_page_history | |
FULL OUTER JOIN | |
milimetric.simplewiki_logging ON cast(log_title as string) = page_title | |
AND log_namespace = page_namespace | |
AND cast(log_timestamp as string) = `end` | |
AND `end` is not null |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import requests | |
import json | |
import csv | |
headers = { | |
'User-Agent': 'Bot Dan Andreescu parsing Special:PrefixIndex pages', | |
'From': '[email protected]' | |
} | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- this is a collection of odd findings while trying to reconstruct mediawiki history | |
-- 1. revisions that should stand for a page creation without a corresponding record in the page table | |
-- REASON: move_redir orphans records in the revision table instead of archiving them like the move_and_delete | |
-- MovePage::moveToInternal $moveOverRedirect, $newpage->doDeleteUpdates 503 | |
-- as opposed to the correct: SpeicalMovepage.php $page->doDeleteArticleReal line 559 | |
select count(*) | |
from revision | |
where rev_parent_id = 0 | |
and rev_page not in (select page_id from page); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val knownStates = sc.parallelize(Array.empty[PageState]) | |
val a2b = new PageEvent( oldTitle = Some("A"), newTitle = Some("B"), timestamp = Some("2011"), | |
eventType = Some("move"), causedByUserId = Some(1), causedByUserName = Some("Dan") | |
) | |
val b2c = new PageEvent( oldTitle = Some("B"), newTitle = Some("C"), timestamp = Some("2015"), | |
eventType = Some("move"), causedByUserId = Some(1), causedByUserName = Some("Dan") | |
) | |
val d2e = new PageEvent( oldTitle = Some("D"), newTitle = Some("E"), timestamp = Some("2014"), | |
eventType = Some("move"), causedByUserId = Some(1), causedByUserName = Some("Dan") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- From the Visibility Data | |
select event_country, | |
sum(if(event_issues = 0, 1, 0)) as 'No Issues', | |
sum(if(event_issues = 1, 1, 0)) as 'Issues with ş', | |
sum(if(event_issues = 2, 1, 0)) as 'Issues with ţ', | |
sum(if(event_issues = 3, 1, 0)) as 'Issues with both', | |
count(*) as 'Total' | |
from DiacriticsVisibility_15606222 | |
group by event_country | |
; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import sqlalchemy | |
from sqlalchemy import create_engine, MetaData, Column, Integer | |
engine = create_engine('mysql://test:test@localhost/test') | |
meta = MetaData() | |
table = sqlalchemy.Table( | |
'test_table', | |
meta, | |
Column('test_col', Integer), |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
aa.wikibooks | |
aa.wikipedia | |
aa.wiktionary | |
ab.wikipedia | |
ace.wikipedia | |
af.wikibooks | |
af.wikipedia | |
af.wikiquote | |
af.wiktionary | |
ak.wikipedia |