The AI agent readiness checklist for 2026
Most technical SEO advice from 2022 is still useful, but it is not enough. AI agents read your site differently than browsers and search crawlers. If you only optimize for what Googlebot sees, you are leaving the agent surface unowned, which means agents quote you unpredictably or skip you entirely.
This checklist covers the six signals that determine whether your content is reliably accessible to AI agents in 2026. Run it before every release.
1. robots.txt with explicit AI rules
A wildcard User-agent: * group is no longer enough. AI vendors increasingly look for their specific user agent before falling back to the wildcard. Be explicit:
- Include a wildcard group with
Allow: /. - Add named groups for GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, anthropic-ai, PerplexityBot, Google-Extended, Applebot-Extended, Bytespider, and CCBot.
- Reference the canonical sitemap.
Verify locally with our robots.txt tester before deploying.
2. Discoverable sitemap.xml
Sitemap discovery affects how reliably agents and crawlers find your canonical URLs.
- Serve at
/sitemap.xmlwith content typeapplication/xml. - Include only canonical, indexable URLs. Skip redirects, noindex, and 404s.
- Use a stable build date for
lastModified. Per-requestnew Date()looks like churn and gets deprioritized. - Reference the sitemap from robots.txt.
Validate the structure with our sitemap validator.
3. Link headers on the homepage
The HTTP Link response header is one of the few mechanisms that surface machine-readable resources without HTML rendering. Send a multi-relation header on the homepage:
Link: </.well-known/api-catalog>; rel="api-catalog",
</llms.txt>; rel="describedby"; type="text/plain",
</sitemap.xml>; rel="sitemap"; type="application/xml",
</rss>; rel="alternate"; type="application/rss+xml"Agents that read this header skip discovery entirely.
4. Markdown content negotiation
When an agent sends Accept: text/markdown, return a clean markdown body. The result is a tighter response (5 to 20 percent the size of HTML) that agents can parse without a DOM.
The implementation is small: one branch in proxy.ts that rewrites to a markdown route handler. See the content negotiation guide for the exact code.
5. AI bot directives that match your intent
Decide what you want and write rules that say it.
- Allow training, allow search: default. Most public sites.
- Block training, allow search: most common stance for sites with valuable original content.
- Block everything: rare, and removes you from AI-mediated discovery entirely.
Use specific user agents. Blocking Googlebot to block Gemini is wrong: Google-Extended is the correct token for opting out of Gemini training without affecting search.
6. Content-Signal directive
Content-Signal is a robots.txt extension that lets you express AI-specific preferences in a single line:
Content-Signal: ai-train=no, search=yes, ai-input=noThree keys, two values each. Place the directive inside a User-agent group, typically the wildcard. A growing number of AI vendors honor it as a faster signal than per-agent rules.
Build a directive with the Content-Signal Builder.
Bonus: structured data
JSON-LD is not strictly an agent-readiness signal, but agents increasingly parse it as a low-cost way to extract entities from a page. At minimum:
- Organization on the homepage / root layout.
- WebSite with a SearchAction template.
- BreadcrumbList on inner pages.
- Article on article pages.
Generate any of these with the JSON-LD Generator and validate them with Google's Rich Results Test.
A baseline you can ship in one hour
If you are starting from zero, this is the minimum viable agent-readiness profile:
/robots.txtwith explicit AI rules and a Content-Signal line./sitemap.xmllisting every canonical URL with a stable build date./llms.txtwith a short site description and a curated link list.Linkresponse header on the homepage advertising the above.- JSON-LD Organization schema in your root layout.
Each piece is small. The compounded effect is that agents can find, parse, and quote your content reliably, instead of guessing.
Apply this baseline with one prompt
Visit the home page and click "Get the full baseline prompt" under the included checks. Paste the prompt into Cursor, Claude Code, or ChatGPT, and the agent will apply all six signals to your repo in a single pass.
Why this matters
Search rewards sites that are easy to crawl. Agents reward sites that are easy to parse. They overlap, but they are not identical. Sites that ignore the agent surface in 2026 will be cited less, summarized worse, and indexed slower. The work to fix it is small. The payoff compounds.