First and foremost, many thanks to Brian Flad and Core Systems’ UNIX Team for configuring MediaWiki, providing tested mass import mechanisms, and setting up MySQL replication. It’s one thing to setup a service for public use -- it’s another to make it public, accessible, and usable. By skimming the recent changes page, it’s easy to see that the platform is gaining traction as several groups have started using it.
In order to show our gratitude, Jamie Ly of Learning Lab and I decided to build a web application that would visualize the recent changes page. To accomplish this several things had to happen:
- Populate recent changes data into a datastore independent of MediaWiki’s
- Make that data accessible via a web API
- Add JSON-P support for client-side API consumption
- Add an attractive visualization
We agreed that splitting up the work would help us achieve our goal in the most efficient manner. Also, dividing the work allowed us to select pieces that cater to our strengths and interests. Jamie has a clear interest in graphics within the browser, so the visualization aspect was his. I’m a fan of APIs, so the backend was mine.
Because faculty research often calls for scraping data off of websites without APIs, I was fairly comfortable with this task. With help from the Hpricot HTML parsing library, I was able to access DOM elements using CSS selectors and regular expressions. As the data is pulled in, it gets pushed to a MySQL database with MD5 hash primary keys to prevent duplicates. An hourly cron job takes care of keeping the database relatively up-to-date.
The web API was assembled with a light Ruby web framework, Sinatra. Instead of following the model-view-controller pattern, Sinatra allows you to map HTTP URLs and methods directly to their actions. Below is the implementation of the /count
route:
get '/count/?' do
if params[:start].nil? or params[:end].nil?
throw :halt, [400, {:message => "Bad request"}.to_json]
else
start_time = Time.at(params[:start].to_i)
end_time = Time.at(params[:end].to_i)
difference = (end_time - start_time).to_i
unless difference > 0 and difference <= TEN_DAYS
throw :halt, [413, {:message => "Request Entity Too Large"}.to_json]
else
json = {:changes => Change.sum(:line_changes, :changed_at => (start_time..end_time))}.to_json
params[:callback].nil? ? json : "#{params[:callback]}(#{json})"
end
end
end
The minimalistic Sinatra framework was well suited for API design. It also integrated nicely with RSpec, a Behavior Driven Development framework for Ruby.
JSON-P is a complement to the base JSON data format. Browsers’ same-origin policy won't allow direct access to external web APIs from JavaScript via AJAX. There is an open policy for <script>
tags, but that only injects API data into the DOM. With JSON-P, the client provides a JavaScript prefix to the API, the API wraps the response with this prefix, generating a valid script as opposed to just data. Below is an example of the client-side implementation using jQuery and the server response.
// Client-side jQuery
$(document).ready(function() {
$.getJSON("http://moon.wharton.upenn.edu/count?start=1267423200&end=1267596000&callback=?", function(o) {
console.log(o.changes);
});
});
// Response -- jQuery replaces the callback argument ? provided in the request with jsonp###...
jsonp1267988911052({"changes":93})
If you're interested in the MediaWiki Usage API documentation, click here.
Visualization is rendered via Google's Visualization API. The API allows you to easily create graphs and charts dynamically.
Upon loading the required libraries via Google's Asynchronous Script Loader, a function is called which uses jQuery to call a document load function. This load function performs initial data retrieval and chart drawing using the default page settings.
The visualization is redrawn on either an element resize event (using jQuery Resizable), the Run button is clicked, or a new visualization type is selected. Data is retrieved only on page load and when the Run button is clicked (implying a change in data filter parameters).