A search crawler and an AI crawler do not necessarily need the same access policy. If organic visibility matters, a change intended for model-training crawlers should not accidentally block Googlebot or Bingbot.
The safe approach is to state the policy explicitly and check each crawler using the robots.txt Tester.
Separate the decisions
Before editing robots.txt, decide how you handle three categories:
| Category | Example user agents | Typical decision |
|---|---|---|
| Search discovery | Googlebot, Bingbot | Allow public, indexable content |
| AI crawler access | GPTBot, ClaudeBot, PerplexityBot | Decide based on your distribution and licensing policy |
| Private paths | Any crawler | Block administrative or non-public areas |
There is no universal AI crawler choice. The engineering requirement is that the deployed rules reflect the intended choice and leave regular search access unchanged unless you intentionally change it.
A policy that keeps search open
This simplified example allows public crawling, blocks a private directory for all crawlers, and applies a separate decision for an AI crawler:
User-agent: *
Allow: /
Disallow: /internal/
User-agent: GPTBot
Disallow: /
Sitemap: https://example.com/sitemap.xmlCheck at least these combinations:
| User agent | Path | Intended verdict |
|---|---|---|
Googlebot | /blog/launch-notes | Allowed |
Googlebot | /internal/report | Disallowed |
GPTBot | /blog/launch-notes | Disallowed |
The Googlebot test matters because a broad wildcard edit can reduce ordinary crawl access while you are focused on AI policy.
Specific groups replace the wildcard group
Under Google's interpretation, a specific matching group is selected instead of combining it with the wildcard group. That is a critical detail in a file like this:
User-agent: *
Disallow: /internal/
User-agent: Googlebot
Allow: /For Googlebot, the specific group applies. If you expect /internal/ to remain blocked for Googlebot, repeat that rule within the Googlebot policy or avoid an unnecessary specific group.
Duplicate specific groups are combined
The reverse detail also causes audit mistakes. These rules for the same specific user agent are combined:
User-agent: Googlebot
Disallow: /archive/
User-agent: Googlebot
Allow: /archive/public/Testing matters because the longer path may allow a subset of an otherwise blocked section.
Paths to check after a policy change
- The homepage and your top organic landing pages.
- New articles or product pages you expect to receive impressions.
- Any sitemap-listed pages touched by broad folder rules.
- Required render assets if you block resource folders.
- Private or preview URLs that must stay unavailable to crawlers.
Verification sources
Google describes its user-agent selection and duplicate-group merging in How Google interprets the robots.txt specification. After a live deployment, use the Search Console robots.txt report to confirm Google fetched a healthy file.
Apply the policy
Use the robots.txt Generator to assemble explicit crawler directives, then paste the final body into the robots.txt Tester and run the path matrix before publishing.
