AgentScan logoAgentScan
Strategy7 min read

Check robots.txt for Googlebot and AI Crawlers Without Mixing Policies

Learn how to check robots.txt separately for Googlebot, GPTBot, ClaudeBot, and PerplexityBot so search access and AI access remain intentional.

Check robots.txt for Googlebot and AI Crawlers Without Mixing Policies

A search crawler and an AI crawler do not necessarily need the same access policy. If organic visibility matters, a change intended for model-training crawlers should not accidentally block Googlebot or Bingbot.

The safe approach is to state the policy explicitly and check each crawler using the robots.txt Tester.

Separate the decisions

Before editing robots.txt, decide how you handle three categories:

CategoryExample user agentsTypical decision
Search discoveryGooglebot, BingbotAllow public, indexable content
AI crawler accessGPTBot, ClaudeBot, PerplexityBotDecide based on your distribution and licensing policy
Private pathsAny crawlerBlock administrative or non-public areas

There is no universal AI crawler choice. The engineering requirement is that the deployed rules reflect the intended choice and leave regular search access unchanged unless you intentionally change it.

A policy that keeps search open

This simplified example allows public crawling, blocks a private directory for all crawlers, and applies a separate decision for an AI crawler:

User-agent: *
Allow: /
Disallow: /internal/

User-agent: GPTBot
Disallow: /

Sitemap: https://example.com/sitemap.xml

Check at least these combinations:

User agentPathIntended verdict
Googlebot/blog/launch-notesAllowed
Googlebot/internal/reportDisallowed
GPTBot/blog/launch-notesDisallowed

The Googlebot test matters because a broad wildcard edit can reduce ordinary crawl access while you are focused on AI policy.

Specific groups replace the wildcard group

Under Google's interpretation, a specific matching group is selected instead of combining it with the wildcard group. That is a critical detail in a file like this:

User-agent: *
Disallow: /internal/

User-agent: Googlebot
Allow: /

For Googlebot, the specific group applies. If you expect /internal/ to remain blocked for Googlebot, repeat that rule within the Googlebot policy or avoid an unnecessary specific group.

Duplicate specific groups are combined

The reverse detail also causes audit mistakes. These rules for the same specific user agent are combined:

User-agent: Googlebot
Disallow: /archive/

User-agent: Googlebot
Allow: /archive/public/

Testing matters because the longer path may allow a subset of an otherwise blocked section.

Paths to check after a policy change

  • The homepage and your top organic landing pages.
  • New articles or product pages you expect to receive impressions.
  • Any sitemap-listed pages touched by broad folder rules.
  • Required render assets if you block resource folders.
  • Private or preview URLs that must stay unavailable to crawlers.

Verification sources

Google describes its user-agent selection and duplicate-group merging in How Google interprets the robots.txt specification. After a live deployment, use the Search Console robots.txt report to confirm Google fetched a healthy file.

Apply the policy

Use the robots.txt Generator to assemble explicit crawler directives, then paste the final body into the robots.txt Tester and run the path matrix before publishing.