AgentScan logoAgentScan
Guides6 min read

Agent-friendly 403, 404, and rate limit responses

When you have to refuse a bot, do it well. The error pages and headers that help AI agents recover instead of retrying blindly into your CDN bill.

Agent-friendly 403, 404, and rate limit responses

When you have to deny an AI agent access, the response you send matters. A bare-bones 403 with no body and a generic 429 with no Retry-After header sends agents into retry storms. A well-formed refusal lets the agent stop, explain to the user, and move on.

This is the practical refusal pattern: the headers that prevent retry storms, the body that helps agents recover, and the design choices that keep your CDN bill sane.

What agents do with errors

A 2026 LLM agent, given a failed HTTP request, typically:

  1. Tries to parse the error body as JSON.
  2. Looks for a human-readable explanation.
  3. Looks for a retry hint.
  4. Looks for an alternative URL or auth path.
  5. Surfaces the result to the user.

If any of those signals are missing, the agent often retries (sometimes aggressively) or hallucinates an explanation for the user. Both outcomes are worse than a slightly more verbose error.

RFC 9457: Problem Details for HTTP APIs

This is the underused standard for error bodies. Use it. The wire format:

HTTP/1.1 403 Forbidden
Content-Type: application/problem+json

{
  "type": "https://example.com/errors/bot-not-verified",
  "title": "Bot identity not verified",
  "status": 403,
  "detail": "The User-Agent claims to be GPTBot but the request did not pass reverse DNS verification.",
  "instance": "/api/v1/data?q=foo"
}

Five fields. Five things an agent can extract:

  • type: a stable URI the agent can recognize across calls.
  • title: the short human form.
  • status: matches the HTTP status (kept for clients that lose the header).
  • detail: the page-specific explanation.
  • instance: the URL that triggered the error.

Add custom fields when you need them:

{
  "type": "https://example.com/errors/rate-limit",
  "title": "Rate limited",
  "status": 429,
  "detail": "Exceeded 60 requests per minute on /api/v1.",
  "retry_after_seconds": 30,
  "limit_per_minute": 60,
  "documentation_url": "https://example.com/docs/rate-limits"
}

Agents will read all of those.

403 Forbidden patterns

Bot is blocked by robots.txt

If a known bot fetches a path you have disallowed, you have two choices.

  1. Serve the requested content but rely on the bot to honor robots.txt.
  2. Actively block at the application layer.

If you actively block, the response should explain why:

{
  "type": "https://example.com/errors/blocked-by-robots",
  "title": "Disallowed by robots.txt",
  "status": 403,
  "detail": "User-Agent GPTBot is disallowed for this path. See /robots.txt.",
  "robots_txt": "https://example.com/robots.txt"
}

This is a much better experience than a blank page.

Bot identity not verified

When you do reverse DNS verification (see Verify the bot is real) and a request fails, return 403 with the diagnostic:

{
  "type": "https://example.com/errors/bot-not-verified",
  "title": "Bot identity not verified",
  "status": 403,
  "detail": "Reverse DNS lookup did not match the claimed User-Agent.",
  "verified_methods_supported": ["reverse-dns", "web-bot-auth"]
}

A polite agent will stop. A spoofing client will keep trying, but at least your logs are clear.

Auth required

Use 401, not 403, when the issue is missing or invalid credentials. Include WWW-Authenticate so agents know which scheme to use:

HTTP/1.1 401 Unauthorized
WWW-Authenticate: Bearer realm="api"
Content-Type: application/problem+json

{
  "type": "https://example.com/errors/unauthorized",
  "title": "Authentication required",
  "status": 401,
  "detail": "Send an Authorization: Bearer <token> header.",
  "documentation_url": "https://example.com/docs/auth"
}

If your auth flow is OAuth, point the agent at /.well-known/oauth-authorization-server or /.well-known/openid-configuration so it can discover the flow without guessing.

404 Not Found

The most common error and the one most often served as a generic HTML page. For API routes, return JSON. For HTML pages, send a clean human page but still set the status to 404.

For agent-friendliness, include suggestions:

{
  "type": "https://example.com/errors/not-found",
  "title": "Resource not found",
  "status": 404,
  "detail": "No customer with ID cust_xyz123.",
  "suggestions": [
    "Check the ID for typos.",
    "Use GET /customers to list available customer IDs."
  ]
}

The HTML 404 page should still be the helpful version: link to the homepage, the most useful sections, and a contact path. AgentScan ships a structured 404 like this; see the not found page for the live example.

Importantly: do not return 200 with a "page not found" message. That confuses agents and search crawlers. Set the status correctly.

429 Rate Limit

The most failure-prone refusal type in 2026. Three rules.

1. Set Retry-After

Always. Either as seconds or as an HTTP date.

HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/problem+json

Without it, agents retry immediately and your problem compounds.

2. Set the rate limit headers

RateLimit-Limit: 60
RateLimit-Remaining: 0
RateLimit-Reset: 30

The widely deployed RateLimit-* fields from the earlier IETF RateLimit header drafts are increasingly read by SDKs and agents. Add them when your stack supports them.

3. Distinguish per-key, per-IP, and global limits in the body

Agents handle this differently:

  • Per-key: agent should pause and resume with the same key after Retry-After.
  • Per-IP: agent should consider switching IP or pausing.
  • Global: agent should stop entirely.
{
  "type": "https://example.com/errors/rate-limited",
  "title": "Rate limited",
  "status": 429,
  "detail": "Exceeded 60 requests per minute on this API key.",
  "scope": "api_key",
  "retry_after_seconds": 30
}

5xx errors

Server errors should also use Problem Details. Agents handle them differently than client errors. Two practices:

  • Include a request_id so users can quote it when they contact support.
  • Avoid leaking stack traces. Some agents will surface the full error to the user.
{
  "type": "https://example.com/errors/internal",
  "title": "Internal server error",
  "status": 500,
  "detail": "Unexpected error processing the request.",
  "request_id": "req_2026_05_25_xy12"
}

For Next.js App Router, this is the kind of payload that should come from your error.tsx or global-error.tsx for HTML pages, and from your route handlers for API responses.

A summary table

StatusResponse style
401JSON Problem Details + WWW-Authenticate header
403JSON Problem Details explaining who is denied and why
404Status 404 + helpful HTML for browsers, JSON for APIs
429Retry-After header + RateLimit-\* headers + Problem Details
5xxGeneric Problem Details + request ID, never leak stack

The HTML and JSON variants share the same status code; only the response body differs based on Accept.

Markdown negotiation for errors

If your site supports markdown content negotiation, apply the same logic to error pages. An agent that asked for text/markdown and hit a 404 should get a markdown body, not a stripped HTML page.

A quick proxy.ts pattern:

if (acceptsMarkdown(request)) {
  return new Response("# Not found\n\nThe page you requested does not exist.", {
    status: 404,
    headers: { "content-type": "text/markdown" },
  });
}

Why this matters

Error pages are the most common pages an agent ever sees on a site that goes deep. Polished error responses cost almost nothing and prevent the worst-case retry storm scenarios. They also leave a better impression on the human watching the agent fail.

If you build a single thing this quarter, build a Problem Details layer that handles 401, 403, 404, 429, and 500 with consistent shapes. Every agent on the agentic web will treat your site better for it.