Optimize docs for agents
Use this guide when you already have a docs site and want agents to find, fetch, attribute, and cite the same content humans read in the browser.
Leadtype generates the files. Your app wires those files into routing and HTML — that runtime side is covered in Serve agent responses.
This guide is about the plumbing — the files agents fetch. For the content inside them, Write for agents covers the one rule that moved the evals: document the non-obvious, not restatements of your types or CLI help.
The default output shape is based on the repo's agent evals. See Evals for the benchmark summary and the open question around larger-corpus llms-full.txt scaling.
What good looks like
An agent-readable docs site has four layers:
/llms.txt, /sitemap.xml, /sitemap.md, and /robots.txt tell agents what exists and where to start.
Each docs page has a markdown mirror at /docs/page.md, and agent requests to /docs/page can receive markdown instead of HTML.
Human HTML pages include JSON-LD, canonical links, and markdown alternate links so agents can extract page identity without guessing from the DOM.
Markdown responses include canonical_url and last_updated frontmatter so copied content keeps its source and freshness.
Generate the artifacts
Run the site-mode pipeline before your app build:
This writes root crawler files, root LLM entry points, and docs artifacts under public/docs/:
The generated agent-readability.json manifest is the bridge between build-time content and runtime requests. It contains page URLs, markdown mirror paths, titles, descriptions, navigation, and freshness dates. Your framework loads it once and passes it to the runtime helpers (markdown responses, JSON-LD, sitemap, robots):
Static crawlers should start at root /robots.txt and /sitemap.xml, which leadtype generate writes.
Pass it through normalizeAgentReadabilityManifest() before handing it to a runtime helper — that validates the shape and fills defaults.
Once these files exist, see Serve agent responses to wire markdown content negotiation, JSON-LD, and per-request sitemap/robots into your framework.
Configure the agent surface
The agent surface is driven by four top-level blocks in docs.config.ts: product (identity), organization (who publishes it), llms (the llms.txt body), and agents (robots / SEO / MCP / skills toggles). Everything beyond the required product.name and product.tagline is optional with zero-config defaults — add a key only to change one.
| Block | Controls | Default |
|---|---|---|
product | identity (name / tagline / links / kind / category) → llms.txt header, JSON-LD software node, agent card | name + tagline required |
organization | publisher → JSON-LD Organization + agent-card provider | optional |
llms.sections | the authored llms.txt body (reference) | empty |
agents.robots | robots.txt crawler policy + Content-Signals (below) | balanced (crawlable, ai-train=no) |
agents.seo | og:image / twitter / keywords head meta | og:type + twitter:card always; rest off |
agents.mcp | { enabled, endpoint, serverInfo, authentication } — emits MCP discovery and points agent surfaces at your MCP endpoint (MCP) | off |
agents.agentCard | { enabled, version } — the /.well-known/agent-card.json | emitted |
agents.skills | the /.well-known/agent-skills surface (skills) | docs-skill emits |
Per-page JSON-LD always emits; the site-level graph is derived from product + organization. Omit these blocks (beyond the required product fields) and you still get balanced robots/Content-Signals, the JSON-LD graph, per-page metadata, the auto docs-skill, and the agent card.
Control AI crawler access
By default (balanced), robots.txt keeps the site fully crawlable and emits a Content-Signals line — Content-Signal: search=yes, ai-input=yes, ai-train=no — so retrieval is welcome but you signal "don't train on this". Change the stance with the additive agents.robots config:
balanced(default) — crawlable + retrievable; signalsai-train=no.open— also welcomes training (ai-train=yes).block-training—Disallow: /for training crawlers (GPTBot, Google-Extended, CCBot, ByteSpider, anthropic-ai, MetaExternalAgent); retrieval crawlers stay allowed.block-ai—Disallow: /for every AI crawler; conventional search engines are untouched; signalsai-input=no, ai-train=no.
The same stance drives the Content-Signal response header on served markdown (see Serve agent responses), so robots.txt and live responses never disagree. Robots Disallow is advisory — pair block-* with your CDN/WAF if you need hard enforcement.
Make the index worth reading
llms.txt is the first thing an indexing model reads, so it should do more than list links. Use the top-level llms.sections array in docs.config.ts to describe the body in order — what the project is, why it's credible, and where to start:
Sections render in array order. A links section and a markdown section with a heading each become an ## heading block; a markdown section without a heading renders its body inline, with no heading. Use markdown sections for anything an indexer should weigh — adoption (stars, downloads), hosting provider, license, positioning — and links sections for curated entry points resolved against your docs. Popularity numbers are author-supplied; fetch them at build time in the config module if you want them live. Full field reference: LLM files → llms.sections.
Verify locally
Start the docs site, then check the URLs agents use:
Expected results:
llms.txtlinks point to existing.mdURLs.- Root sitemap and robots routes resolve.
- Markdown responses use
Content-Type: text/markdown; charset=utf-8. - Markdown frontmatter contains
canonical_urlandlast_updated. - HTML pages contain one
application/ld+jsonscript. - Missing docs pages return a markdown body for agent-oriented requests.
Then run the Vercel Agent Readability audit against the docs entry point:
If the audit reports broken links from llms.txt, check whether your generated URLs use the production --base-url while your local audit expects localhost. For local dev, serve root sitemap/robots files from the current request origin, or run the audit against a preview URL that matches --base-url.
Minimal checklist
- Generate
llms.txt, markdown mirrors,llms-full.txt, sitemap, robots, andagent-readability.json. - Serve root-level
/sitemap.xml,/sitemap.md, and/robots.txt. - Add JSON-LD to every docs HTML page.
- Add canonical and markdown alternate links to every docs HTML page.
- Return markdown for agent Accept headers, AI user agents, and
.mdURLs. - Include
canonical_urlandlast_updatedin markdown frontmatter. - Return markdown bodies for missing docs pages requested by agents.
For the wiring that satisfies the middleware/JSON-LD/markdown lines of this checklist, see Serve agent responses.