Last active
April 10, 2018 15:02
-
-
Save jwo/a3db6d4eb5355adbca6e to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# And, to activate, you need to tell Rails to load it up: | |
# config/application.rb | |
config.middleware.insert_before 0, Rack::Attack |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'resolv' | |
class Rack::Attack | |
class Request < ::Rack::Request | |
def remote_ip | |
@remote_ip ||= (env['HTTP_X_FORWARDED_FOR'] || ip).to_s | |
end | |
end | |
end | |
Rack::Attack.throttle('req/ip', :limit => 300, :period => 5.minutes) do |req| | |
req.remote_ip if ['/assets', '/check'].any? {|path| req.path.starts_with? path } | |
end | |
#Rack::Attack.blacklist('block very bad actors') do |req| | |
# ['10.0.0.1', '192.168.1.30'].include? req.remote_ip | |
#end | |
Rack::Attack.blacklist('googlebots who are not googlebots') do |req| | |
if req.user_agent =~ /Googlebot/i | |
begin | |
name = Resolv.getname(req.remote_ip.to_s) | |
reversed_ip = Resolv.getaddress(name) | |
resolves_correctly = name.end_with?("googlebot.com") || name.end_with?("google.com") | |
reverse_resolves = reversed_ip == req.remote_ip.to_s | |
is_google = resolves_correctly && reverse_resolves | |
!is_google | |
rescue Resolv::ResolvError | |
true | |
end | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I just wrote pretty much the same piece of code, almost to the "t" 👍 , except I didn't think to check for the HTTP_X_FORWARDED_FOR ( not using load balancers yet), and throttling is a nice touch. Thank you for that insight.
A small suggestion - you should probably also add these top 10 popular bots. They may not bring as much traffic as Google, but are still legitimate large engines.
name.end_with?("msn.com")
and user_agent "Bingbot"There are also Baidubot (from China), Alexabot ( Amazon's), Yandexbot ( Russian search engine), Exalead ( from France), Sougu ( from China) and Facebook bot ( they use it for breviwing links, I think)
Here a post that details UA's for all of them: https://www.keycdn.com/blog/web-crawlers/