Skip to content

Instantly share code, notes, and snippets.

@justingarrick
Last active December 24, 2017 11:23
Show Gist options
  • Save justingarrick/4611432 to your computer and use it in GitHub Desktop.
Save justingarrick/4611432 to your computer and use it in GitHub Desktop.
Visit a webpage X times with a random user-agent via a randomly chosen proxy server. A decent proxy list can be downloaded from http://www.proxynova.com/proxy_list.txt, but this is not scripted because the site frequently goes down.
#!/usr/bin/env ruby
require 'rubygems'
require 'mechanize'
class Visitor
def initialize(proxy_file, url, iterations)
@proxy_file = proxy_file
@url = url
@iterations = iterations
end
def get_proxies
proxies = []
File.open(@proxy_file, "r").each_line do |line|
tokens = line.split(":")
proxy = { ip: tokens[0], port: tokens[1].delete("\n") }
proxies << proxy
end
return proxies
end
def run
proxies = get_proxies
([email protected]_i).each do |i|
user_agent = Mechanize::AGENT_ALIASES.keys.sample
proxy = proxies.sample
mech_agent = Mechanize.new
mech_agent.user_agent_alias = user_agent
mech_agent.set_proxy proxy[:ip], proxy[:port]
puts "[#{i}]Visit " << @url << " as " << user_agent << " @ " << proxy[:ip] << ":" << proxy[:port]
begin
page = mech_agent.get(@url)
rescue
puts "Failed! Removing " << proxy[:ip] << ":" << proxy[:port]
proxies.delete_if { |hash| hash[:ip] == proxy[:ip] && hash[:port] == proxy[:port] }
end
end
end
end
# e.g. ruby .\thisscript.rb .\proxies.txt http://google.com 1000
if __FILE__ == $PROGRAM_NAME
visitor = Visitor.new ARGV[0], ARGV[1], ARGV[2]
visitor.run
end
@CyrusZei
Copy link

nice code

@CyrusZei
Copy link

Is there a way to loop until you get a proxy that works? here is a example

Lets say you have a database called Proxy with tables IP and PORT.

I want mechanize to try to connect to a site with a proxy.

I take the first proxy and try to connect to mysite.com after timeout or response error I try another proxy until that works and break out of the loop ?

I have been trying to get this to work, but no luck here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment