A robots.txt file can look reasonable and still block the exact page you want discovered. The useful question is not only whether the file exists. It is: can this crawler fetch this path under the rules you plan to deploy?
Use the robots.txt Tester to answer that question before changing a production file.
What a robots.txt tester should answer
Every useful test has three inputs:
| Input | Example | Why it matters |
|---|---|---|
| robots.txt body | Your proposed file contents | Rules may differ from the currently live file |
| URL path | /docs/getting-started | Rules apply to paths, not page titles |
| User agent | Googlebot or GPTBot | A specific group overrides the wildcard group |
The output should identify the matched group, the winning Allow or Disallow rule, and the final access verdict.
A five-minute test workflow
- Open your live
/robots.txtfile in a browser and paste its contents into the tester. - Enter a high-value URL path such as
/,/pricing,/blog/article-slug, or a documentation route. - Test
Googlebotfirst. Normal organic search access should be intentional. - Test any AI crawlers that matter to your distribution policy, such as
GPTBot,ClaudeBot, orPerplexityBot. - Paste your proposed updated rules and repeat the same paths before deployment.
For example, this rule set allows most crawling but blocks /private/:
User-agent: *
Allow: /
Disallow: /private/
User-agent: Googlebot
Allow: /private/public-release/Testing /private/public-release/ with Googlebot should return Allowed because the Googlebot-specific rule is selected and its allow path applies.
Why rule order is not enough
robots.txt is not evaluated as "last line wins." For Google's documented interpretation:
- The most specific matching user-agent group is selected.
- If that same user-agent appears in multiple groups, its applicable rules are combined.
- Within the chosen rules, the longest matching path wins.
- If an
AllowandDisallowrule tie in length,Allowwins.
That makes a tester necessary for patterns such as:
User-agent: Googlebot
Disallow: /catalog/
User-agent: Googlebot
Allow: /catalog/public/A check for /catalog/public/item-1 must consider rules from both Googlebot blocks. AgentScan's tester combines equally specific groups for this reason.
Test paths that can affect impressions
Do not limit the check to the homepage. Include:
- Important landing pages that are already earning impressions.
- New blog URLs before requesting indexing.
- CSS, JavaScript, and image paths needed to render indexable pages.
- Filter or search-result paths that you intentionally want blocked.
- Sitemap and feed discovery paths if your site links to them.
A blocked URL can still appear in search in limited situations, but Google may not be able to crawl its current content. robots.txt is a crawl control file, not a reliable removal mechanism.
Validate the live result after publishing
Once deployed, open https://yourdomain.com/robots.txt and confirm that it shows the exact rules you tested. Then review the Google Search Console robots.txt report for fetch problems and reported parsing issues.
For Google's rule matching details, use the primary reference: How Google interprets the robots.txt specification.
Run a test now
Paste your current rules into the robots.txt Tester, check a page that matters to traffic, then use the robots.txt Generator if the policy needs to change.
