Skip to content

Instantly share code, notes, and snippets.

@jamiely
Created March 9, 2010 19:22
Show Gist options
  • Save jamiely/326980 to your computer and use it in GitHub Desktop.
Save jamiely/326980 to your computer and use it in GitHub Desktop.

MediaWiki Usage Visualization and Web Service API

First and foremost, many thanks to Brian Flad and Core Systems’ UNIX Team for configuring MediaWiki, providing tested mass import mechanisms, and setting up MySQL replication. It’s one thing to setup a service for public use -- it’s another to make it public, accessible, and usable. By skimming the recent changes page, it’s easy to see that the platform is gaining traction as several groups have started using it.

In order to show our gratitude, Jamie Ly of Learning Lab and I decided to build a web application that would visualize the recent changes page. To accomplish this several things had to happen:

  1. Populate recent changes data into a datastore independent of MediaWiki’s
  2. Make that data accessible via a web API
  3. Add JSON-P support for client-side API consumption
  4. Add an attractive visualization

We agreed that splitting up the work would help us achieve our goal in the most efficient manner. Also, dividing the work allowed us to select pieces that cater to our strengths and interests. Jamie has a clear interest in graphics within the browser, so the visualization aspect was his. I’m a fan of APIs, so the backend was mine.

Data Scraping

Because faculty research often calls for scraping data off of websites without APIs, I was fairly comfortable with this task. With help from the Hpricot HTML parsing library, I was able to access DOM elements using CSS selectors and regular expressions. As the data is pulled in, it gets pushed to a MySQL database with MD5 hash primary keys to prevent duplicates. An hourly cron job takes care of keeping the database relatively up-to-date.

Web API

The web API was assembled with a light Ruby web framework, Sinatra. Instead of following the model-view-controller pattern, Sinatra allows you to map HTTP URLs and methods directly to their actions. Below is the implementation of the /count route:

get '/count/?' do
  if params[:start].nil? or params[:end].nil?
    throw :halt, [400, {:message => "Bad request"}.to_json]
  else
    start_time = Time.at(params[:start].to_i)
    end_time = Time.at(params[:end].to_i)
    difference = (end_time - start_time).to_i

    unless difference > 0 and difference <= TEN_DAYS
      throw :halt, [413, {:message => "Request Entity Too Large"}.to_json]
    else
      json = {:changes => Change.sum(:line_changes, :changed_at => (start_time..end_time))}.to_json
      params[:callback].nil? ? json : "#{params[:callback]}(#{json})"
    end
  end
end

The minimalistic Sinatra framework was well suited for API design. It also integrated nicely with RSpec, a Behavior Driven Development framework for Ruby.

JSON-P

JSON-P is a complement to the base JSON data format. Browsers’ same-origin policy won't allow direct access to external web APIs from JavaScript via AJAX. There is an open policy for <script> tags, but that only injects API data into the DOM. With JSON-P, the client provides a JavaScript prefix to the API, the API wraps the response with this prefix, generating a valid script as opposed to just data. Below is an example of the client-side implementation using jQuery and the server response.

// Client-side jQuery
$(document).ready(function() {
  $.getJSON("http://moon.wharton.upenn.edu/count?start=1267423200&end=1267596000&callback=?", function(o) {
    console.log(o.changes);
  });
});

// Response -- jQuery replaces the callback argument ? provided in the request with jsonp###...
jsonp1267988911052({"changes":93})

If you're interested in the MediaWiki Usage API documentation, click here.

Visualization

Visualization is rendered via Google's Visualization API. The API allows you to easily create graphs and charts dynamically.

Upon loading the required libraries via Google's Asynchronous Script Loader, a function is called which uses jQuery to call a document load function. This load function performs initial data retrieval and chart drawing using the default page settings.

The visualization is redrawn on either an element resize event (using jQuery Resizable), the Run button is clicked, or a new visualization type is selected. Data is retrieved only on page load and when the Run button is clicked (implying a change in data filter parameters).

Conclusion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment