The pages on the old wiki had critical pieces of hard-won info embedded in often obsolete articles. Just copying the whole unstructured archive seems crazy; but so does throwing it all away and only having new content. AllenJB's suggestion of a template to flag these sounds ideal. This way, when we hit a page we know whether the content is all shiny and new, or captured from the old wiki and of "variable quality"...
- Capture existing (possibly very messy) HTML to Markdown, either with Pandoc or the Make.text bookmarklet.
- Edit Markdown (if you prefer it to editing Mediawiki)
- Convert to Mediawiki syntax with Pandoc
This gets the bulk of the document converted - for example:
- http://www.gentoo-wiki.info/HOWTO_ALSA - captured snapshot of old wiki page content
- http://gist.github.com/raw/24209/eabb1528865698c282c9dea007fbce1ef52e1d88 - as converted to Markdown by Make.text
- http://scratch.gentoo-wiki.com/wiki/ALSA_sound_system - converted to mediawiki by Pandoc
layman -a haskell
emerge -av pandoc # needs >=pandoc-0.47
Install the Make.text bookmarklet if you want it.
$ wget http://scratch.gentoo-wiki.com/wiki/ALSA_sound_system
$ pandoc -r html -w mediawiki -o alsa-sound.wiki ALSA_sound_system
... capture page to alsa-sound.md and edit
$ pandoc -r markdown -w mediawiki -o alsa-sound.wiki alsa-sound.md
Try Wikipedia - Showdown is quite fun :-)
Markdown source: http://gist.github.com/24243