Skip to content

Instantly share code, notes, and snippets.

@jwo
Last active April 10, 2018 15:02
Show Gist options
  • Save jwo/a3db6d4eb5355adbca6e to your computer and use it in GitHub Desktop.
Save jwo/a3db6d4eb5355adbca6e to your computer and use it in GitHub Desktop.
# And, to activate, you need to tell Rails to load it up:
# config/application.rb
config.middleware.insert_before 0, Rack::Attack
require 'resolv'
class Rack::Attack
class Request < ::Rack::Request
def remote_ip
@remote_ip ||= (env['HTTP_X_FORWARDED_FOR'] || ip).to_s
end
end
end
Rack::Attack.throttle('req/ip', :limit => 300, :period => 5.minutes) do |req|
req.remote_ip if ['/assets', '/check'].any? {|path| req.path.starts_with? path }
end
#Rack::Attack.blacklist('block very bad actors') do |req|
# ['10.0.0.1', '192.168.1.30'].include? req.remote_ip
#end
Rack::Attack.blacklist('googlebots who are not googlebots') do |req|
if req.user_agent =~ /Googlebot/i
begin
name = Resolv.getname(req.remote_ip.to_s)
reversed_ip = Resolv.getaddress(name)
resolves_correctly = name.end_with?("googlebot.com") || name.end_with?("google.com")
reverse_resolves = reversed_ip == req.remote_ip.to_s
is_google = resolves_correctly && reverse_resolves
!is_google
rescue Resolv::ResolvError
true
end
end
end
@konung
Copy link

konung commented Apr 10, 2018

I just wrote pretty much the same piece of code, almost to the "t" 👍 , except I didn't think to check for the HTTP_X_FORWARDED_FOR ( not using load balancers yet), and throttling is a nice touch. Thank you for that insight.

A small suggestion - you should probably also add these top 10 popular bots. They may not bring as much traffic as Google, but are still legitimate large engines.

  • MSN - Bing's bot verification works exactly the same way, except the name.end_with?("msn.com") and user_agent "Bingbot"
  • Yahoo Slurp -
  • DuckDuckGo https://duckduckgo.com/duckduckbot ( they actually have a list of 5 ips, only 2 resolve to duckduckgo.com
    There are also Baidubot (from China), Alexabot ( Amazon's), Yandexbot ( Russian search engine), Exalead ( from France), Sougu ( from China) and Facebook bot ( they use it for breviwing links, I think)
    Here a post that details UA's for all of them: https://www.keycdn.com/blog/web-crawlers/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment