Skip to content

Instantly share code, notes, and snippets.

@hideojoho
Last active December 26, 2022 23:36
Show Gist options
  • Save hideojoho/2332e7de6f531b43405ebcb7d1713242 to your computer and use it in GitHub Desktop.
Save hideojoho/2332e7de6f531b43405ebcb7d1713242 to your computer and use it in GitHub Desktop.
How to set up a VM that has a local copy of Wikipedia

Requirements

  • At least 6-8GB of RAM (of those 4GB will be used for a virtual machine)
  • At least 3GB of diskspace (depends on your dump data size)
  • 1 - many hours of time (depends on your network speed and dump data size)

Environments

  • MacOSX 10.13.6
  • Homebrew 2.1.6
  • Vagrant 2.2.5
  • Virtualbox 6.0.10
  • CentOS 7.6

Install Homebrew

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Install VirtualBox and Vagrant

$ brew cask install virtualbox
$ brew cask install vagrant

Install Vagrant plugins

$ vagrant plugin install vagrant-vbguest
$ vagrant plugin install vagrant-scp

Install a new box (CentOS7)

The following commands download a Vagrant box for CentOS 7 from their server. It takes time.

$ vagrant box add centos/7
Select 3 (VirtualBox)
$ mkdir CentOS7; cd CentOS7
$ vagrant init centos/7
$ mv Vagrantfile Vagrantfile.orig

Download the files and save them to CentOS7

Start a VM

The following command installs CentOS 7 to your virtual machine along with required libraries. It takes time.

$ vagrant up

SSH to VM

$ vagrant ssh
[vagrant@localhost ~]$ 

Source of the following steps: https://www.mediawiki.org/wiki/Manual:Running_MediaWiki_on_Red_Hat_Linux

Initialise DB

[vagrant@localhost ~]$ sudo systemctl start mariadb
[vagrant@localhost ~]$ sudo mysql_secure_installation
...
Enter current password for root (enter for none):
...
Set root password? [Y/n] n
...
Remove anonymous users? [Y/n] Y
...
Disallow root login remotely? [Y/n] Y
...
Remove test database and access to it? [Y/n] Y
...
Reload privilege tables now? [Y/n] Y
...
[vagrant@localhost ~]$ mysql -u root -p
Enter password: 
...
MariaDB [(none)]> CREATE USER 'wiki'@'localhost' IDENTIFIED BY 'wikipedia0';
MariaDB [(none)]> CREATE DATABASE jawiki;
MariaDB [(none)]> GRANT ALL PRIVILEGES ON jawiki.* TO 'wiki'@'localhost';
MariaDB [(none)]> FLUSH PRIVILEGES;
MariaDB [(none)]> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| jawiki             |
| mysql              |
| performance_schema |
+--------------------+
4 rows in set (0.00 sec)

MariaDB [(none)]> SHOW GRANTS FOR 'wiki'@'localhost';
+-------------------------------------------------------------------------------------------------------------+
| Grants for wiki@localhost                                                                                   |
+-------------------------------------------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO 'wiki'@'localhost' IDENTIFIED BY PASSWORD '*D47E25F7211229630BE824987D144C13C91A9464' |
| GRANT ALL PRIVILEGES ON `jawiki`.* TO 'wiki'@'localhost'                                                    |
+-------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

MariaDB [(none)]> exit;
[vagrant@localhost ~]$ 

Install MediaWiki 1/3

[vagrant@localhost ~]$ cd
[vagrant@localhost ~]$ wget https://releases.wikimedia.org/mediawiki/1.33/mediawiki-1.33.0.tar.gz
[vagrant@localhost ~]$ wget https://releases.wikimedia.org/mediawiki/1.33/mediawiki-1.33.0.tar.gz.sig
[vagrant@localhost ~]$ gpg --verify mediawiki-1.33.0.tar.gz.sig mediawiki-1.33.0.tar.gz
[vagrant@localhost ~]$ cd /var/www
[vagrant@localhost www]$ sudo tar -zxf /home/vagrant/mediawiki-1.33.0.tar.gz
[vagrant@localhost www]$ sudo ln -s mediawiki-1.33.0/ mediawiki
[vagrant@localhost www]$ sudo chown -R apache:apache /var/www/mediawiki
[vagrant@localhost www]$ sudo chown -R apache:apache /var/www/mediawiki-1.33.0 

Edit Apache conf and start the server

[vagrant@localhost www]$ sudo vi /etc/httpd/conf/httpd.conf
...
(Line 119) DocumentRoot "/var/www/mediawiki" # <- DocumentRoot "/var/www/html"
...
(Line 131) <Directory "/var/www/mediawiki"> # <-  <Directory "/var/www/html">
...
(Line 157) DirectoryIndex index.html index.html.var index.php # <- Add this line
...
[vagrant@localhost www]$ sudo systemctl start httpd

Install MediaWiki 2/3

  • Go to http://192.168.33.10/
  • Click Please set up the wiki first.
  • Select your language
  • Make sure you're ready with green message
  • DB Settings
    • Host: localhost
    • DB Name: jawiki
    • DB User name: wiki
    • DB User password: wikipedia0
  • Next, next
  • MediaWiki Settings
    • Wikiname: Wikipedia
    • User name: root
    • User password: wikipedia0
  • Next, next, next
  • LocalSetting.php will be automatically downloaded

Copy LocalSetting.php to VM

The following command is from your host machine not the VM

$ vagrant scp ~/Downloads/LocalSettings.php /home/vagrant/

Install MediaWiki 3/3

The following command is from the VM not your host machine

[vagrant@localhost www]$ sudo cp /home/vagrant/LocalSettings.php mediawiki
[vagrant@localhost www]$ sudo chown -R apache:apache mediawiki/LocalSettings.php

Go to http://192.168.33.10/index.php

Install Scribunto Extention

[vagrant@localhost ~]$ wget https://extdist.wmflabs.org/dist/extensions/Scribunto-REL1_33-8328acb.tar.gz
[vagrant@localhost ~]$ sudo tar -xzf Scribunto-REL1_33-8328acb.tar.gz -C /var/www/mediawiki/extensions
[vagrant@localhost ~]$ sudo chown -R apache:apache /var/www/mediawiki-1.33.0
[vagrant@localhost ~]$ sudo chmod a+x /var/www/mediawiki/extensions/Scribunto/includes/engines/LuaStandalone/binaries/lua5_1_5_linux_64_generic/lua
[vagrant@localhost ~]$ sudo vi /var/www/mediawiki/LocalSettings.php
wfLoadExtension( 'Scribunto' ); # <- Add this line at the end
$wgScribuntoDefaultEngine = 'luastandalone'; # <- Add this line at the end

Install MassMessage Extention

[vagrant@localhost ~]$ wget https://extdist.wmflabs.org/dist/extensions/MassMessage-REL1_33-22e7d07.tar.gz
[vagrant@localhost ~]$ sudo tar -xzf MassMessage-REL1_33-22e7d07.tar.gz -C /var/www/mediawiki/extensions
[vagrant@localhost ~]$ sudo chown -R apache:apache /var/www/mediawiki-1.33.0
[vagrant@localhost ~]$ sudo vi /var/www/mediawiki/LocalSettings.php
wfLoadExtension( 'MassMessage' ); # <- Add this line at the end

Install TemplateStyles Extention

[vagrant@localhost ~]$ wget https://extdist.wmflabs.org/dist/extensions/TemplateStyles-REL1_33-c76fd84.tar.gz
[vagrant@localhost ~]$ sudo tar -xzf TemplateStyles-REL1_33-c76fd84.tar.gz -C /var/www/mediawiki/extensions
[vagrant@localhost ~]$ sudo chown -R apache:apache /var/www/mediawiki-1.33.0
[vagrant@localhost ~]$ sudo vi /var/www/mediawiki/LocalSettings.php
wfLoadExtension( 'TemplateStyles' ); # <- Add this line at the end

Download Wikipedia Dump file (2.8GB)

[vagrant@localhost www]$ cd
[vagrant@localhost ~]$ wget https://dumps.wikimedia.org/jawiki/latest/jawiki-latest-pages-articles.xml.bz2

Upload the dump data to Mediawiki

  • Dry run
[vagrant@localhost ~]$ php /var/www/mediawiki/maintenance/importDump.php --dry-run --conf /var/www/mediawiki/LocalSettings.php --server="http://192.168.33.10/" jawiki-latest-pages-articles.xml.bz2
100 (246.16 pages/sec 246.16 revs/sec)
200 (328.81 pages/sec 328.81 revs/sec)
300 (345.88 pages/sec 345.88 revs/sec)
400 (389.64 pages/sec 389.64 revs/sec)
500 (433.00 pages/sec 433.00 revs/sec)
600 (448.71 pages/sec 448.71 revs/sec)
700 (475.27 pages/sec 475.27 revs/sec)
800 (475.92 pages/sec 475.92 revs/sec)
900 (481.52 pages/sec 481.52 revs/sec)
1000 (489.51 pages/sec 489.51 revs/sec)
...
2403900 (1056.31 pages/sec 1056.31 revs/sec)
2404000 (1056.32 pages/sec 1056.32 revs/sec)
Done!
You might want to run rebuildrecentchanges.php to regenerate RecentChanges,
and initSiteStats.php to update page and revision counts
[vagrant@localhost ~]$ 
  • Formal run
[vagrant@localhost ~]$ php /var/www/mediawiki/maintenance/importDump.php --conf /var/www/mediawiki/LocalSettings.php  --server="http://192.168.33.10/" jawiki-latest-pages-articles.xml.bz2 

This will take forever, like few days to complete for a large dump. Alternatively, you can try other dump data from Wikimedia Downloads.

Monitor the import progress

Update the recent change and stats

The following commands could be run when uploading is completed.

[vagrant@localhost ~]$ sudo php /var/www/mediawiki/maintenance/rebuildrecentchanges.php
[vagrant@localhost ~]$ sudo php /var/www/mediawiki/maintenance/initSiteStats.php --update

Stop VM

[vagrant@localhost ~]$ exit
$ vagrant halt

If you run out VM's disk space

# Update package
echo "Updating default packages ..."
sudo yum -y update
# Install dev tools
echo "Installing Development tools ..."
sudo yum -y groupinstall base "Development tools" --setopt=group_package_types=mandatory,default,optional
# Disable SELinux and firewall
echo "Disabling SELinux and firewall ..."
sudo setenforce 0
sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/sysconfig/selinux
# Install PHP, Apache, MariaDB
echo "Installing Apache, PHP and MariaDB ..."
sudo yum -y install epel-release
sudo rpm -Uvh http://rpms.famillecollet.com/enterprise/remi-release-7.rpm
sudo yum -y install --enablerepo=remi,remi-php71 php php-devel php-mbstring php-mysql php-pdo php-gd php-xml php-mcrypt
sudo yum -y install httpd mariadb mariadb-server
sudo systemctl enable mariadb
sudo systemctl enable httpd
# -*- mode: ruby -*-
# vi: set ft=ruby :
# All Vagrant configuration is done below. The "2" in Vagrant.configure
# configures the configuration version (we support older styles for
# backwards compatibility). Please don't change it unless you know what
# you're doing.
Vagrant.configure("2") do |config|
# The most common configuration options are documented and commented below.
# For a complete reference, please see the online documentation at
# https://docs.vagrantup.com.
# Every Vagrant development environment requires a box. You can search for
# boxes at https://vagrantcloud.com/search.
config.vm.box = "centos/7"
# Disable automatic box update checking. If you disable this, then
# boxes will only be checked for updates when the user runs
# `vagrant box outdated`. This is not recommended.
# config.vm.box_check_update = false
# Create a forwarded port mapping which allows access to a specific port
# within the machine from a port on the host machine. In the example below,
# accessing "localhost:8080" will access port 80 on the guest machine.
# NOTE: This will enable public access to the opened port
# config.vm.network "forwarded_port", guest: 80, host: 8080
# Create a forwarded port mapping which allows access to a specific port
# within the machine from a port on the host machine and only allow access
# via 127.0.0.1 to disable public access
# config.vm.network "forwarded_port", guest: 80, host: 8080, host_ip: "127.0.0.1"
# Create a private network, which allows host-only access to the machine
# using a specific IP.
config.vm.network "private_network", ip: "192.168.33.10"
# Create a public network, which generally matched to bridged network.
# Bridged networks make the machine appear as another physical device on
# your network.
# config.vm.network "public_network"
# Share an additional folder to the guest VM. The first argument is
# the path on the host to the actual folder. The second argument is
# the path on the guest to mount the folder. And the optional third
# argument is a set of non-required options.
# config.vm.synced_folder "../data", "/vagrant_data"
# Provider-specific configuration so you can fine-tune various
# backing providers for Vagrant. These expose provider-specific options.
# Example for VirtualBox:
#
config.vm.provider "virtualbox" do |vb|
# # Display the VirtualBox GUI when booting the machine
# vb.gui = true
#
# # Customize the amount of memory on the VM:
vb.memory = "4096"
end
#
# View the documentation for the provider you are using for more
# information on available options.
# Enable provisioning with a shell script. Additional provisioners such as
# Puppet, Chef, Ansible, Salt, and Docker are also available. Please see the
# documentation for more information about their specific syntax and use.
# config.vm.provision "shell", inline: <<-SHELL
# apt-get update
# apt-get install -y apache2
# SHELL
config.vm.provision "shell", privileged: false, path: "Vagrant_provision.sh"
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment