Skip to content

Instantly share code, notes, and snippets.

@milimetric
Last active August 10, 2016 13:21
Show Gist options
  • Save milimetric/8bc5780634315961d9ee718bbc6cc776 to your computer and use it in GitHub Desktop.
Save milimetric/8bc5780634315961d9ee718bbc6cc776 to your computer and use it in GitHub Desktop.
-- this is a collection of odd findings while trying to reconstruct mediawiki history
-- 1. revisions that should stand for a page creation without a corresponding record in the page table
-- REASON: move_redir orphans records in the revision table instead of archiving them like the move_and_delete
-- MovePage::moveToInternal $moveOverRedirect, $newpage->doDeleteUpdates 503
-- as opposed to the correct: SpeicalMovepage.php $page->doDeleteArticleReal line 559
select count(*)
from revision
where rev_parent_id = 0
and rev_page not in (select page_id from page);
-- User talk pages without User pages:
-- https://simple.wikipedia.org/wiki/User_talk:80.202.71.242
-- Delete events without any page creations:
-- example: simplewiki: World_War_I is deleted but it exists before and after the deletion.
-- After we run the algorithm we end up with some pages that have different start and creation timestamps in the create event:
/*
page_id page_id_artificial page_creation_timestamp page_title page_namespace start end caused_by_event_type caused_by_user_id caused_by_user_name
398766 NULL 20130517044842 Soccer_champ 0 20130517044843 NULL create 209999 Osiris
398410 NULL 20130515114429 Lse2 10 20130515114430 NULL create 209999 Osiris
397929 NULL 20130514022718 Country_data_Kurdistan 10 20130514022719 NULL create 209999 Osiris
392129 NULL 20120313234059 Petermann_Ranges 0 20130403031823 NULL create 209999 Osiris
386116 NULL 20130223122629 Michail_Sholokhov 0 20130223122630 NULL create 68157 Macdonald-ross
381484 NULL 20130125083613 Charles_Louis_Alphonse_Laveran 0 20130125083614 NULL create 68157 Macdonald-ross
377880 NULL 20130101014644 First_Lieutenant_(United_States) 1 20130101014645 NULL create 159972 Kumioko
377343 NULL 20121227151447 Armigerous 0 20121227151448 NULL create 22027 Auntof6
377171 NULL 20121225115003 The_Twelve_Days_of_Christmas 0 20121225114843 NULL create 17988 Peterdownunder
376299 NULL 20121218153843 Someone_Like_You_(SafetySuit_song) 0 20121218153844 NULL create 212856 Snow Blizzard
371052 NULL 20121115005453 Sub-plot 1 20121115005454 NULL create 100584 Michaeldsuarez
371023 NULL 20121114225900 Where_Sleeplessness_Is_Rest_From_Nightmares 0 20121114225901 NULL create 100584 Michaeldsuarez
369132 NULL 20110104055353 2011_Green_Bay_Packers_season 0 20121104033118 NULL create 158854 DPH1110
362054 NULL 20120915184702 Hakka_(language) 0 20120915184703 NULL create 135008 Hazard-SJ
357761 NULL 20070319162249 List_of_SpongeBob_SquarePants_episodes 0 20120813034223 NULL create 5738 Creol
357198 NULL 20120808120818 Bully:_Scholarship_Edition 0 20120808120819 NULL create 238030 MAXXX-309
357086 NULL 20120807105032 Israeli 10 20120807105033 NULL create 209999 Osiris
356246 NULL 20120801043005 Avro_Valiant 0 20120801042804 NULL create 209999 Osiris
354359 NULL 20120720071750 Taxonomy 10 20120720070752 NULL create 18469 Bob the Wikipedian
530883 NULL 20160204213030 Administrators/Archive11 4 20120626174051 NULL create 18539 Djsasso
*/
-- pages with multiple rev_parent_id = 0 revisions with timestamps all over their history:
select rev_page, count(*)
from revisions
where rev_parent_id = 0
group by rev_page
;
-- some user tables don't have the same schema (eg: user_options column)
select schema_name
from information_schema.schemata
where schema_name not in
(select distinct table_schema
from columns
where table_name = 'user'
and column_name = 'user_options'
)
order by schema_name
;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment