Last active
December 28, 2015 21:19
-
-
Save gaurish/7563496 to your computer and use it in GitHub Desktop.
Web Scrapper to get election data
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
gem 'mechanize' | |
gem 'pry' | |
gem 'awesome_print' | |
gem 'logger' | |
gem 'activesupport' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
begin | |
require 'mechanize' | |
require 'pry' | |
require 'awesome_print' | |
require 'logger' | |
require 'active_support/all' | |
rescue LoadError => e | |
friendly_ex = e.exception("required Gems missing. run bundle install to fix the issue") | |
friendly_ex.set_backtrace(e.backtrace) | |
raise friendly_ex | |
end | |
# fetch district list based on state code(U05) | |
## discard first option(SELECT) & set district | |
### fetch ac list for that particular district | |
#### discard first option(SELECT) & set ac | |
##### fetch ps list for that particular ac | |
# Do untill we have list of all ps for a state code(U05) | |
class Scrapper | |
DATA_URL = :'http://www.eci-polldaymonitoring.nic.in/psleci/Default.aspx' | |
attr_accessor :agent | |
def initialize | |
@agent = Mechanize.new do |a| | |
a.user_agent_alias = 'Windows Chrome' | |
a.log = Logger.new "activity.log" | |
a.get DATA_URL | |
end | |
@data = [] | |
end | |
def select_state | |
choose('ddlState').options[25].click #NCT delhi | |
post | |
end | |
def districts | |
choose('ddlDistrict').options.each_with_index do |option, index| | |
next if option.text =~ /select/i | |
choose('ddlDistrict').options[index].click | |
post | |
acs | |
end | |
write_to_disk | |
end | |
def acs | |
choose('ddlAC').options.each_with_index do |option, index| | |
next if option.text =~ /all/i | |
choose('ddlAC').options[index].click | |
post | |
polling_stations | |
end | |
end | |
def polling_stations | |
ps_list = [] | |
choose('ddlPS').options.each do |ps| | |
next if ps.text =~ /all/i | |
ps_list << { name: ps.text, code: ps.value } | |
end | |
@data << { | |
state_code: choose('ddlState').query_value.first.last, | |
district_code: choose('ddlDistrict').query_value.first.last, | |
assembly_constituency_code: choose('ddlAC').query_value.first.last, | |
polling_stations: ps_list | |
} | |
end | |
private | |
def post | |
@agent.page.form.submit | |
end | |
def form | |
@agent.page.form | |
end | |
def choose(name) | |
form.field_with(name: name) | |
end | |
def write_to_disk | |
filename = "#{choose('ddlState').query_value.first.last}.xml" | |
File.open(filename, 'w') {|f| f.write(@data.to_xml) } | |
end | |
end | |
# USAGE | |
# s = Scrapper.new | |
# s.select_states | |
# s.districts |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment