References
AI Bot Directives Reference
An up-to-date reference for major AI crawlers including their user-agent strings, owners, and recommended directives. Toggle each one to assemble a robots.txt fragment.
| User agent | Owner | Purpose | Policy |
|---|---|---|---|
GPTBot Disallow to opt out of OpenAI model training. | OpenAI | Training data crawler | |
ChatGPT-User Triggered by user actions in ChatGPT. Usually allowed. | OpenAI | ChatGPT user-initiated browse | |
OAI-SearchBot Indexes content for OpenAI search products. | OpenAI | OpenAI search index | |
ClaudeBot Anthropic's primary content crawler. | Anthropic | Claude training and retrieval | |
anthropic-ai Older Anthropic agent string. Many sites still allow it. | Anthropic | Legacy agent identifier | |
PerplexityBot Indexes content for Perplexity answers. | Perplexity | Answer engine indexer | |
Google-Extended Does not affect Google Search; controls Gemini training. | Gemini training opt-out | ||
Applebot-Extended Disallow to opt out of Apple Intelligence training. | Apple | Apple Intelligence training | |
Bytespider Used for ByteDance models. Aggressive crawl history. | ByteDance | Doubao / TikTok training | |
CCBot Many AI datasets derive from Common Crawl. | Common Crawl | Public web archive | |
Meta-ExternalAgent Used by Meta for AI products. | Meta | Meta AI ingestion | |
DuckAssistBot Indexes content for DuckAssist. | DuckDuckGo | DuckAssist answer feature | |
Amazonbot Used to ingest content for Amazon AI services. | Amazon | Alexa and Amazon AI | |
Diffbot Powers many third-party AI knowledge products. | Diffbot | Knowledge graph crawler | |
FacebookBot Used by Meta for translation models. | Meta | Translation training |
robots.txt fragment
text · 45 lines · 514 B
User-agent: GPTBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: OAI-SearchBot Allow: / User-agent: ClaudeBot Allow: / User-agent: anthropic-ai Allow: / User-agent: PerplexityBot Allow: / User-agent: Google-Extended Allow: / User-agent: Applebot-Extended Allow: / User-agent: Bytespider Allow: / User-agent: CCBot Allow: / User-agent: Meta-ExternalAgent Allow: / User-agent: DuckAssistBot Allow: / User-agent: Amazonbot Allow: / User-agent: Diffbot Allow: / User-agent: FacebookBot Allow: /
Apply this with your coding agent
Get a ready-made prompt that wraps the output above with implementation steps. Paste it into your AI assistant and let it ship the change.
Reading these directives
Each block targets one bot by user-agent string. A site that wants to allow Google Search but block Gemini training would Allow Googlebot and Disallow Google-Extended.
Compliance is voluntary. Most listed crawlers honor robots.txt, but enforcement varies and rules can be revised by vendors over time.
Verify on a real URL
Run a full agent readiness scan to see how your live site responds end to end.