Skip to content

Instantly share code, notes, and snippets.

@cs150bf
Forked from MattiSG/README.md
Created August 15, 2017 05:56
Show Gist options
  • Save cs150bf/feac1a925c5758143fe2e756f31c1c75 to your computer and use it in GitHub Desktop.
Save cs150bf/feac1a925c5758143fe2e756f31c1c75 to your computer and use it in GitHub Desktop.
Convert a wiki from MediaWiki to Gollum and Markdown, importing all metadata.

This will convert a wiki from MediaWiki to Gollum and Markdown (or any other format supported by Pandoc).

  1. Install dependencies:

    brew install pandoc icu4c
    gem install --no-ri --no-rdoc hpricot gollum gollum-lib pandoc-ruby
  2. Perform a Special:Export

  3. Edit the constants at the top of the .rb for your particular needs.

  4. ruby mw-to-gollum.rb

All inner links between pages with diacritics and spaces will be broken. You can list all links to be reviewed by searching for the string wikilink in all pages.

#!/usr/bin/env ruby
require 'rubygems'
require 'hpricot'
require 'gollum'
require 'gollum-lib'
require 'i18n'
require 'pandoc-ruby'
SOURCE = 'wiki.openfisca.fr-20160414143040.xml'
TARGET = '.'
DESTINATION_FORMAT = :markdown
I18n.config.available_locales = :en # just to make it shut up, we want to use a static method anyway
wiki = Gollum::Wiki.new(TARGET)
file = File.open(SOURCE, 'r')
doc = Hpricot(file)
doc.search('/mediawiki/page').each do |page|
title = I18n.transliterate(page.at('title').inner_text)
page.search('revision').each do |revision|
revision_id = revision.at('id').inner_text
author = revision.at('contributor/username').inner_text
original_time = revision.at('timestamp').inner_text
comment = revision.at('comment') && revision.at('comment').inner_text
begin
content = PandocRuby.convert(revision.at('text').inner_text, :from => :mediawiki, :to => DESTINATION_FORMAT)
rescue
print '!'
next
end
commit = {
message: "#{comment}
Import from #{SOURCE} (rev #{revision_id})
Original modification made at #{original_time}",
name: author,
email: "#{author.tr(' ', '_')}@#{SOURCE}"
}
page = wiki.page(title)
if page
wiki.update_page(page, page.name, DESTINATION_FORMAT, content, commit)
print '.'
else
wiki.write_page(title, DESTINATION_FORMAT, content, commit)
puts
puts title
end
end
end
puts
puts 'Done!'
file.close
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment