Skip to content

Instantly share code, notes, and snippets.

@Eseperio
Created December 13, 2024 07:12
Show Gist options
  • Save Eseperio/2c27380d3c0c0ada188411592ffda859 to your computer and use it in GitHub Desktop.
Save Eseperio/2c27380d3c0c0ada188411592ffda859 to your computer and use it in GitHub Desktop.
Block AI Bots Server-wide in Plesk or ModSecurity-powered Servers

Block AI Bots Server-wide in Plesk or ModSecurity-powered Servers

Introduction

AI bots can be aggressive and cause severe server overload by consuming excessive resources. Their primary purpose is to train their machine learning models using your data, often without your consent. This activity can degrade your server's performance, cause downtime, and compromise your site's availability. This guide explains how to block these unwanted AI bots using ModSecurity on Plesk-managed servers.

Step 1: Enable ModSecurity in Plesk

  1. Log in to Plesk as an administrator.
  2. Go to Tools & Settings > Web Application Firewall (ModSecurity).
  3. Ensure that ModSecurity is enabled and select the Balanced or Fast mode, depending on your server's configuration and performance needs.

Step 2: Add Custom Rules

  1. In the same ModSecurity settings page, scroll to Custom directives.
  2. Add the following custom rule:
# Block AI bots by User-Agent
SecRule HTTP_User-Agent "GPTBot|Amazonbot|Meta" "phase:1,id:100002,deny,status:403,log,msg:'AI bot blocked by User-Agent'"

Rule Breakdown

  • SecRule HTTP_User-Agent "GPTBot|Amazonbot|Meta": Matches User-Agents containing these bot names.
  • phase:1: Executes during the request headers phase.
  • id:100002: Unique rule identifier.
  • deny,status:403: Denies the request and returns HTTP 403 Forbidden.
  • log: Logs the request in the ModSecurity audit logs.
  • msg: Descriptive message for the logs.

Step 3: Save and Apply

  1. Click Save to apply the custom rule.
  2. Restart Nginx and Apache (or httpd) if needed:
    systemctl restart nginx
    systemctl restart apache2  # Or httpd on CentOS

Testing the Rule

Run a test using curl:

curl -I "https://yourdomain.com" -A "GPTBot"

You should receive a 403 Forbidden response:

HTTP/1.1 403 Forbidden

Logs Verification

Check if requests are logged:

tail -f /var/log/modsec_audit.log

Additional Tips

  • Add more bots to the rule by expanding the list in the HTTP_User-Agent condition.
  • Regularly update the list of bots if new unwanted crawlers emerge.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment