AgentScan logoAgentScan
Guides6 min read

robots.txt Tester: Check a URL Path Before Google Crawls It

Use a robots.txt tester to confirm whether Googlebot or an AI crawler can fetch a specific path before you publish new directives.

robots.txt Tester: Check a URL Path Before Google Crawls It

A robots.txt file can look reasonable and still block the exact page you want discovered. The useful question is not only whether the file exists. It is: can this crawler fetch this path under the rules you plan to deploy?

Use the robots.txt Tester to answer that question before changing a production file.

What a robots.txt tester should answer

Every useful test has three inputs:

InputExampleWhy it matters
robots.txt bodyYour proposed file contentsRules may differ from the currently live file
URL path/docs/getting-startedRules apply to paths, not page titles
User agentGooglebot or GPTBotA specific group overrides the wildcard group

The output should identify the matched group, the winning Allow or Disallow rule, and the final access verdict.

A five-minute test workflow

  1. Open your live /robots.txt file in a browser and paste its contents into the tester.
  2. Enter a high-value URL path such as /, /pricing, /blog/article-slug, or a documentation route.
  3. Test Googlebot first. Normal organic search access should be intentional.
  4. Test any AI crawlers that matter to your distribution policy, such as GPTBot, ClaudeBot, or PerplexityBot.
  5. Paste your proposed updated rules and repeat the same paths before deployment.

For example, this rule set allows most crawling but blocks /private/:

User-agent: *
Allow: /
Disallow: /private/

User-agent: Googlebot
Allow: /private/public-release/

Testing /private/public-release/ with Googlebot should return Allowed because the Googlebot-specific rule is selected and its allow path applies.

Why rule order is not enough

robots.txt is not evaluated as "last line wins." For Google's documented interpretation:

  • The most specific matching user-agent group is selected.
  • If that same user-agent appears in multiple groups, its applicable rules are combined.
  • Within the chosen rules, the longest matching path wins.
  • If an Allow and Disallow rule tie in length, Allow wins.

That makes a tester necessary for patterns such as:

User-agent: Googlebot
Disallow: /catalog/

User-agent: Googlebot
Allow: /catalog/public/

A check for /catalog/public/item-1 must consider rules from both Googlebot blocks. AgentScan's tester combines equally specific groups for this reason.

Test paths that can affect impressions

Do not limit the check to the homepage. Include:

  • Important landing pages that are already earning impressions.
  • New blog URLs before requesting indexing.
  • CSS, JavaScript, and image paths needed to render indexable pages.
  • Filter or search-result paths that you intentionally want blocked.
  • Sitemap and feed discovery paths if your site links to them.

A blocked URL can still appear in search in limited situations, but Google may not be able to crawl its current content. robots.txt is a crawl control file, not a reliable removal mechanism.

Validate the live result after publishing

Once deployed, open https://yourdomain.com/robots.txt and confirm that it shows the exact rules you tested. Then review the Google Search Console robots.txt report for fetch problems and reported parsing issues.

For Google's rule matching details, use the primary reference: How Google interprets the robots.txt specification.

Run a test now

Paste your current rules into the robots.txt Tester, check a page that matters to traffic, then use the robots.txt Generator if the policy needs to change.