Block GPTBot, ClaudeBot, PerplexityBot in 2026
If you do not want your content used to train AI models, the practical answer in 2026 is a small, well-formed robots.txt with explicit groups for each major AI crawler, plus a single Content-Signal directive that declares your intent. This guide walks through the exact rules, the trade-offs, and the verification steps.
What "blocking AI" actually means
There are three distinct things you can opt out of, and they are not interchangeable.
- Training: bots like GPTBot, ClaudeBot, Google-Extended, Applebot-Extended, and Bytespider crawl public pages to build datasets used for model training.
- Live answer engines: bots like PerplexityBot, OAI-SearchBot, and DuckAssistBot index pages so AI search products can cite them in real time.
- On-demand fetching: bots like ChatGPT-User and Claude-Web fetch a URL only when a human asks the assistant to read it.
Blocking training does not block search. Blocking live answer engines does not block on-demand fetching. Decide which doors you want closed before writing rules.
A baseline robots.txt that blocks training only
User-agent: *
Allow: /
Content-Signal: ai-train=no, search=yes, ai-input=no
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Meta-ExternalAgent
Disallow: /
Sitemap: https://yourdomain/sitemap.xmlThis profile keeps Googlebot, Bingbot, and human readers fully able to use the site, while explicitly opting out of model training. It also declares your AI-content preferences via the Content-Signal directive, which a growing number of vendors honor.
What about answer engines
If you want to opt out of answer engines too, add these groups:
User-agent: PerplexityBot
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
User-agent: DuckAssistBot
Disallow: /Be aware: removing your content from answer engines reduces the chance of being cited as a source. A site that blocks every AI surface tends to disappear from AI-mediated discovery entirely.
On-demand fetching is usually safe to allow
When a user pastes your URL into ChatGPT or Claude and says "summarize this", the request comes from ChatGPT-User or Claude-Web, not the training crawlers. Most sites should allow these on demand fetches because they happen with explicit user intent and produce traffic-equivalent referrals.
User-agent: ChatGPT-User
Allow: /
User-agent: Claude-Web
Allow: /Where to put the file
The file must serve at https://yourdomain/robots.txt with content type text/plain. In Next.js App Router, the cleanest option is a route handler:
// app/robots.txt/route.ts
import { NextResponse } from "next/server";
const BODY = `...the rules above...`;
export async function GET() {
return new NextResponse(BODY, {
status: 200,
headers: { "content-type": "text/plain; charset=utf-8" },
});
}For static frameworks, drop the file into the public folder.
Verifying the rules apply
The fastest local check is a test of the matching rules using AgentScan's robots.txt tester: paste the body, set the user agent to GPTBot and the path to /, and confirm the verdict is Disallowed. Repeat for each blocked bot.
After deploying, run:
curl -s https://yourdomain/robots.txtThe body should match what you generated, and the response should return HTTP 200 with content type text/plain.
Common mistakes
- Putting Content-Signal at the top of the file. It must appear inside a User-agent group, not before the first one.
- Blocking with a meta tag. The
<meta name="robots" content="noai">tag does not exist in any standard. Use robots.txt and Content-Signal instead. - Blocking Googlebot to block Gemini. Google-Extended is the right token for Gemini training. Blocking Googlebot removes you from regular search.
- Forgetting the Sitemap line. Even when blocking AI bots, search engines still need the sitemap reference for indexing.
What this does not do
robots.txt is a request, not a wall. Bots can ignore it. Most reputable AI vendors honor it today, but enforcement is voluntary. If you need cryptographic enforcement, look at Web Bot Auth and IP allow lists. For most public content, voluntary compliance is the practical ceiling.
Next steps
- Generate your own profile with the robots.txt Generator.
- Tune the AI-content preferences with the Content-Signal Builder.
- Run a full agent readiness scan from the home page to catch related gaps.