Skip to content

Instantly share code, notes, and snippets.

@jpmckinney
Created March 24, 2012 19:08
Show Gist options
  • Save jpmckinney/2186809 to your computer and use it in GitHub Desktop.
Save jpmckinney/2186809 to your computer and use it in GitHub Desktop.
Wikipedia API
class Wiki
def self.parse(lang, page)
begin
json = Yajl::Parser.parse open("http://#{lang}.wikipedia.org/w/api.php?maxlag=5&redirects=1&format=json&disablepp=1&prop=text%7Cdisplaytitle%7Crevid&action=parse&section=0&page=#{CGI.escape(page)}").read
doc = Nokogiri::HTML(json['parse']['text']['*'])
# article message boxes
doc.css('.ambox').remove
doc.css('.bandeau').remove
doc.css('.bandeau-portail').remove
doc.css('.homonymie').remove
# error messages
doc.css('.error').remove
# infoboxes
doc.css('.infobox').remove
doc.css('.infobox_v2').remove
doc.css('.thumb.tright').remove
doc.at_css('body').inner_html.to_s.sub(%r{<p><br></p>}, '')
rescue SocketError
''
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment