References

AI Bot Directives Reference

An up-to-date reference for major AI crawlers including their user-agent strings, owners, and recommended directives. Toggle each one to assemble a robots.txt fragment.

User agent	Owner	Purpose
GPTBot Disallow to opt out of OpenAI model training.	OpenAI	Training data crawler
ChatGPT-User Triggered by user actions in ChatGPT. Usually allowed.	OpenAI	ChatGPT user-initiated browse
OAI-SearchBot Indexes content for OpenAI search products.	OpenAI	OpenAI search index
ClaudeBot Anthropic's primary content crawler.	Anthropic	Claude training and retrieval
anthropic-ai Older Anthropic agent string. Many sites still allow it.	Anthropic	Legacy agent identifier
PerplexityBot Indexes content for Perplexity answers.	Perplexity	Answer engine indexer
Google-Extended Does not affect Google Search; controls Gemini training.	Google	Gemini training opt-out
Applebot-Extended Disallow to opt out of Apple Intelligence training.	Apple	Apple Intelligence training
Bytespider Used for ByteDance models. Aggressive crawl history.	ByteDance	Doubao / TikTok training
CCBot Many AI datasets derive from Common Crawl.	Common Crawl	Public web archive
Meta-ExternalAgent Used by Meta for AI products.	Meta	Meta AI ingestion
DuckAssistBot Indexes content for DuckAssist.	DuckDuckGo	DuckAssist answer feature
Amazonbot Used to ingest content for Amazon AI services.	Amazon	Alexa and Amazon AI
Diffbot Powers many third-party AI knowledge products.	Diffbot	Knowledge graph crawler
FacebookBot Used by Meta for translation models.	Meta	Translation training

robots.txt fragment

text · 45 lines · 514 B

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: Bytespider
Allow: /

User-agent: CCBot
Allow: /

User-agent: Meta-ExternalAgent
Allow: /

User-agent: DuckAssistBot
Allow: /

User-agent: Amazonbot
Allow: /

User-agent: Diffbot
Allow: /

User-agent: FacebookBot
Allow: /

Apply this with your coding agent

Get a ready-made prompt that wraps the output above with implementation steps. Paste it into your AI assistant and let it ship the change.

Reading these directives

Each block targets one bot by user-agent string. A site that wants to allow Google Search but block Gemini training would Allow Googlebot and Disallow Google-Extended.

Compliance is voluntary. Most listed crawlers honor robots.txt, but enforcement varies and rules can be revised by vendors over time.

Verify on a real URL

Run a full agent readiness scan to see how your live site responds end to end.

Scan a URL

robots.txt fragment

Apply this with your coding agent

Reading these directives

Verify on a real URL

Related tools