Last active
December 24, 2017 11:23
-
-
Save justingarrick/4611432 to your computer and use it in GitHub Desktop.
Visit a webpage X times with a random user-agent via a randomly chosen proxy server. A decent proxy list can be downloaded from http://www.proxynova.com/proxy_list.txt, but this is not scripted because the site frequently goes down.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env ruby | |
require 'rubygems' | |
require 'mechanize' | |
class Visitor | |
def initialize(proxy_file, url, iterations) | |
@proxy_file = proxy_file | |
@url = url | |
@iterations = iterations | |
end | |
def get_proxies | |
proxies = [] | |
File.open(@proxy_file, "r").each_line do |line| | |
tokens = line.split(":") | |
proxy = { ip: tokens[0], port: tokens[1].delete("\n") } | |
proxies << proxy | |
end | |
return proxies | |
end | |
def run | |
proxies = get_proxies | |
([email protected]_i).each do |i| | |
user_agent = Mechanize::AGENT_ALIASES.keys.sample | |
proxy = proxies.sample | |
mech_agent = Mechanize.new | |
mech_agent.user_agent_alias = user_agent | |
mech_agent.set_proxy proxy[:ip], proxy[:port] | |
puts "[#{i}]Visit " << @url << " as " << user_agent << " @ " << proxy[:ip] << ":" << proxy[:port] | |
begin | |
page = mech_agent.get(@url) | |
rescue | |
puts "Failed! Removing " << proxy[:ip] << ":" << proxy[:port] | |
proxies.delete_if { |hash| hash[:ip] == proxy[:ip] && hash[:port] == proxy[:port] } | |
end | |
end | |
end | |
end | |
# e.g. ruby .\thisscript.rb .\proxies.txt http://google.com 1000 | |
if __FILE__ == $PROGRAM_NAME | |
visitor = Visitor.new ARGV[0], ARGV[1], ARGV[2] | |
visitor.run | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is there a way to loop until you get a proxy that works? here is a example
Lets say you have a database called Proxy with tables IP and PORT.
I want mechanize to try to connect to a site with a proxy.
I take the first proxy and try to connect to mysite.com after timeout or response error I try another proxy until that works and break out of the loop ?
I have been trying to get this to work, but no luck here.