Skip to content

Instantly share code, notes, and snippets.

@dpaluy
Created March 5, 2013 20:14
Show Gist options
  • Select an option

  • Save dpaluy/5093865 to your computer and use it in GitHub Desktop.

Select an option

Save dpaluy/5093865 to your computer and use it in GitHub Desktop.
Crawling LinkedIn
require 'rubygems'
require 'mechanize'
require 'nokogiri'
@linkedin_username = "username"
@linkedin_password = "password"
agent = Mechanize.new
agent.user_agent_alias = "Mac Safari"
agent.follow_meta_refresh = true
agent.get("https://www.linkedin.com")
#Login to LI
form = agent.page.form_with(:action => '/uas/login-submit')
form['session_key'] = @linkedin_username
form['session_password'] = @linkedin_password
agent.submit(form)
pp "Login successful"
def search_class(page, search_query)
output_array = Array.new
page.search(search_query).each do |element|
if !element.nil? && !element.inner_html.nil?
output_array.push(element.inner_html)
end
end
return output_array
end
def search_image_class(page, search_query)
output_array = Array.new
page.search(search_query).each do |element|
if !element.nil? && !element["alt"].nil?
p element["alt"]
output_array.push(element["alt"])
end
end
return output_array
end
names = []
titles = []
locations = []
industries = []
agent.get("http://www.linkedin.com/wvmx/profile?trk=nmp_profile_stats_viewed_by") do |page|
names = search_image_class(page, 'img[@class="photo"]')
titles = search_class(page, 'dd[@class = "title"]')
locations = search_class(page, 'dd[@class="location"]')
industries = search_class(page, 'dd[@class="industries"]')
end
names.each_with_index do |name, index|
puts "#{name}\t#{titles[index]}\t#{locations[index]}\t#{industries[index]}"
end
@dpaluy
Copy link
Author

dpaluy commented Mar 5, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment