AgentScan logoAgentScan
References

AI Bot Directives Reference

An up-to-date reference for major AI crawlers including their user-agent strings, owners, and recommended directives. Toggle each one to assemble a robots.txt fragment.

User agentOwnerPurposePolicy
GPTBot
Disallow to opt out of OpenAI model training.
OpenAITraining data crawler
ChatGPT-User
Triggered by user actions in ChatGPT. Usually allowed.
OpenAIChatGPT user-initiated browse
OAI-SearchBot
Indexes content for OpenAI search products.
OpenAIOpenAI search index
ClaudeBot
Anthropic's primary content crawler.
AnthropicClaude training and retrieval
anthropic-ai
Older Anthropic agent string. Many sites still allow it.
AnthropicLegacy agent identifier
PerplexityBot
Indexes content for Perplexity answers.
PerplexityAnswer engine indexer
Google-Extended
Does not affect Google Search; controls Gemini training.
GoogleGemini training opt-out
Applebot-Extended
Disallow to opt out of Apple Intelligence training.
AppleApple Intelligence training
Bytespider
Used for ByteDance models. Aggressive crawl history.
ByteDanceDoubao / TikTok training
CCBot
Many AI datasets derive from Common Crawl.
Common CrawlPublic web archive
Meta-ExternalAgent
Used by Meta for AI products.
MetaMeta AI ingestion
DuckAssistBot
Indexes content for DuckAssist.
DuckDuckGoDuckAssist answer feature
Amazonbot
Used to ingest content for Amazon AI services.
AmazonAlexa and Amazon AI
Diffbot
Powers many third-party AI knowledge products.
DiffbotKnowledge graph crawler
FacebookBot
Used by Meta for translation models.
MetaTranslation training

robots.txt fragment

text · 45 lines · 514 B

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: Bytespider
Allow: /

User-agent: CCBot
Allow: /

User-agent: Meta-ExternalAgent
Allow: /

User-agent: DuckAssistBot
Allow: /

User-agent: Amazonbot
Allow: /

User-agent: Diffbot
Allow: /

User-agent: FacebookBot
Allow: /

Apply this with your coding agent

Get a ready-made prompt that wraps the output above with implementation steps. Paste it into your AI assistant and let it ship the change.

Reading these directives

Each block targets one bot by user-agent string. A site that wants to allow Google Search but block Gemini training would Allow Googlebot and Disallow Google-Extended.

Compliance is voluntary. Most listed crawlers honor robots.txt, but enforcement varies and rules can be revised by vendors over time.

Verify on a real URL

Run a full agent readiness scan to see how your live site responds end to end.

Scan a URL

Related tools