Skip to content

Instantly share code, notes, and snippets.

@henshaw
Last active April 13, 2025 15:10
Show Gist options
  • Save henshaw/aa8b68ad8b7f897c709bd0ef4fd03b48 to your computer and use it in GitHub Desktop.
Save henshaw/aa8b68ad8b7f897c709bd0ef4fd03b48 to your computer and use it in GitHub Desktop.
Disallow all pages from being trained for LLMs by all major GenAI bots
User-agent: Amazonbot
User-agent: Anthropic-ai
User-agent: Applebot-Extended
User-agent: AwarioRssBot
User-agent: AwarioSmartBot
User-agent: Bytespider
User-agent: CCBot
User-agent: ChatGPT-User
User-agent: ClaudeBot
User-agent: Claude-Web
User-agent: Cohere-ai
User-agent: DataForSeoBot
User-agent: FacebookBot
User-agent: Google-Extended
User-agent: GPTBot
User-agent: ImagesiftBot
User-agent: Magpie-crawler
User-agent: Omgili
User-agent: Omgilibot
User-agent: Peer39_crawler
User-agent: Peer39_crawler/1.0
User-agent: YouBot
Disallow: /
@henshaw
Copy link
Author

henshaw commented Sep 29, 2023

@henshaw
Copy link
Author

henshaw commented Jun 30, 2024

Grouped the list of user-agents to shorten the code. H/T to @chapter42

@henshaw
Copy link
Author

henshaw commented Apr 13, 2025

Removed PerplexityBot because they publicly stated they do not use it to crawl content for its AI foundation models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment