Created
March 20, 2012 10:01
-
-
Save kke/2133704 to your computer and use it in GitHub Desktop.
mechanize ytj by bid scraper
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| require 'mechanize' | |
| def get_ytj(bid) | |
| agent = Mechanize.new | |
| agent.user_agent = 'Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)' | |
| search_form = agent.get('http://www.ytj.fi/yrityshaku.aspx').form_with(:name => 'aspnetForm') | |
| search_form.field_with(:name => '_ctl0:ContentPlaceHolder:ytunnus').value = bid | |
| page = search_form.submit(search_form.button_with(:value => 'Hae yritykset')).link_with(:text => bid).click | |
| Hash[*page.search("//div[@id='detail-result']//table/tr").collect{|row| row.search('td')[0..1].collect{|cell| cell.inner_text.chomp.gsub(/^\s+/, "").gsub(/\s+$/, "")}}.flatten] | |
| rescue | |
| nil | |
| end | |
| puts get_ytj(ARGV[0]) | |
| $ ruby ytj.rb 2129112-6 | |
| {"Toiminimi"=>"Maventa Oy", "Rinnakkaistoiminimi"=>"Maventa Ltd", "Aputoiminimi"=>"Idoneus\nVerkkolaskut fi\nSuomen Verkkolaskut.fi", "Yritysmuoto"=>"Osakeyhtiö", "Kotipaikka"=>"HELSINKI", "Yrityksen kieli"=>"Suomi", "Päätoimiala"=>"Ohjelmistojen suunnittelu ja valmistus (62010)", "Postiosoite"=>"PL 934\n00101 HELSINKI", "Käyntiosoite"=>"Kanavaranta 7 F\n00160 HELSINKI", "Puhelin"=>"+358923165651", "www"=>"http://www.maventa.com", "Kaupparekisteri\n "=>"Rekisterissä\n ", "Verohallinnon perustiedot\n "=>"Rekisterissä\n ", "Ennakkoperintärekisteri\n "=>"Rekisterissä\n ", "Arvonlisäverovelvollisuus\n "=>"Liiketoiminnasta alv-velvollinen\n ", "Työnantajarekisteri\n "=>"Rekisterissä\n ", "Seuraava tarkistuspäivä"=>"28.02.2013", "20.07.2007"=>"Tunnus annettu"} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment