Optimize docs for agents
Optimize docs for agents
Use this guide when you already have a docs site and want agents to find, fetch, attribute, and cite the same content humans read in the browser.
Leadtype handles the generated files. Your app wires those files into routing and HTML.
The default output shape is based on the repo's agent evals. See Evals for the benchmark summary and the open question around larger-corpus llms-full.txt scaling.
What good looks like
An agent-readable docs site has four layers:
/llms.txt, /sitemap.xml, /sitemap.md, and /robots.txt tell agents what exists and where to start.
Each docs page has a markdown mirror at /docs/page.md, and agent requests to /docs/page can receive markdown instead of HTML.
Human HTML pages include JSON-LD, canonical links, and markdown alternate links so agents can extract page identity without guessing from the DOM.
Markdown responses include canonical_url and last_updated frontmatter so copied content keeps its source and freshness.
1. Generate the artifacts
Run the site-mode pipeline before your app build:
This writes the docs-scoped files under public/docs/ and the top-level public/llms.txt:
The generated agent-readability.json manifest is the bridge between build-time content and runtime requests. It contains page URLs, markdown mirror paths, titles, descriptions, group navigation, and freshness dates.
2. Add one middleware
Put the agent-readable routes before your HTML docs route. This Node/Bun example handles root discovery files, docs-scoped discovery files, direct .md URLs, and Accept: text/markdown requests in one place:
If you also have marketing, blog, changelog, or product pages, pass them through the optional pages field — the regenerator merges them into the rebased output:
The other generated artifacts — /llms.txt, /docs/llms.txt, /llms-full.txt, /docs/agent-readability.json — use root-relative links and serve fine as static files straight from public/.
Keep the docs-scoped versions too (/docs/sitemap.xml etc.). Audits and agents may request both /sitemap.xml and /docs/sitemap.xml, especially when the audited URL is /docs.
3. Add JSON-LD to docs pages
Use the manifest entry for the current page and render Schema.org JSON-LD into the HTML head:
Use renderJsonLd(page, manifest) if your framework has a typed metadata API. Use renderJsonLdScript(page, manifest) if your framework expects an HTML string.
Also add canonical and markdown alternate links:
The JSON-LD gives agents the page title, description, canonical URL, last modified date, and breadcrumbs without scraping your rendered layout.
4. Return markdown to agents
The middleware above uses createAgentMarkdownResponse. It returns a Web Response (or null when the path is not an agent-oriented markdown request) and handles:
Accept: text/markdownandAccept: text/plaincontent negotiation (q-values respected).- Known AI user-agent headers (GPTBot, ClaudeBot, Bingbot, AmazonBot, MetaExternalAgent, PerplexityBot, MistralBot, AppleBot, ByteSpider, YouBot, …).
- Direct
.mdURLs such as/docs/quickstart.md. canonical_urlandlast_updatedfrontmatter aliases injected automatically.- 200 markdown responses for missing docs pages, so agents do not discard the body.
Content-Type: text/markdown; charset=utf-8,Vary: Accept[, User-Agent],Link: <…>; rel="canonical",Cache-Control: public, max-age=300, must-revalidate.
readMarkdownFile may be sync or async. In Node/Bun, read from disk. In Cloudflare, fetch from KV/R2 or an asset binding. In Vercel Edge, fetch from the deployment's static asset URL.
Put that logic wherever your framework can intercept docs requests before its HTML route:
| Framework/runtime | Where it usually goes |
|---|---|
| TanStack Start / nitro | server/middleware/agent-readability.ts (h3). One middleware handles both the markdown response and the sitemap/robots regenerators — runs in dev, preview, and prod. See apps/example/server/middleware/agent-readability.ts for the canonical reference. |
| Nuxt | server/middleware/agent-readability.ts (h3) — same shape as the TanStack Start example. |
| Next.js | middleware.ts (Edge) or a catch-all route handler before the docs page. |
| Astro | An endpoint at pages/docs/[...slug].md.ts or astro:middleware. |
| Cloudflare Workers/Pages | Worker fetch handler with KV/R2 asset binding for the markdown reader. |
| Express/Hono/Fastify | Middleware before the docs HTML route. |
Tip: if you keep static
sitemap.xml/sitemap.md/robots.txtfiles in your build output, your framework's static handler may serve them before your middleware can rebase URLs to the live origin. Either delete the static copies after the build (so the middleware always runs) or make sure your middleware is registered ahead of static-asset serving.
Do not rewrite llms.txt, sitemap.xml, sitemap.md, robots.txt, llms-full.txt, or agent-readability.json to page markdown. The helper leaves those artifact paths alone.
Why the sitemap and robots responses are regenerated, not static
sitemap.xml's <loc> requires absolute URLs, and robots.txt's Sitemap: directive is conventionally absolute too. So those two files cannot be plain static assets if you want previews and staging to advertise the right origin. The middleware in step 2 already calls createSitemapXmlResponse, createSitemapMarkdownResponse, and createRobotsTxtResponse — each rebuilds from the manifest using the live requestOrigin, so preview, staging, and prod all advertise the right URLs without per-environment config.
The other generated files (/llms.txt, /docs/llms.txt, /llms-full.txt) use root-relative links, so they can be served as plain static assets — no per-request rewriting needed.
Cache-Control and CDN
Every helper adds Cache-Control: public, max-age=300, must-revalidate by default. Pair it with the Vary: Accept, User-Agent header that createAgentMarkdownResponse sets when an AI user-agent is detected — your CDN must shard cache entries on those headers, or it will serve HTML to agents from a markdown-cached entry (or vice versa). Override the default with cacheControl: "<directive>", or pass cacheControl: null to omit the header entirely (useful when your CDN sets caching out-of-band).
5. Verify locally
Start the docs site, then check the URLs agents use:
Expected results:
llms.txtlinks point to existing.mdURLs.- Both root and docs-scoped sitemaps resolve.
- Markdown responses use
Content-Type: text/markdown; charset=utf-8. - Markdown frontmatter contains
canonical_urlandlast_updated. - HTML pages contain one
application/ld+jsonscript. - Missing docs pages return a markdown body for agent-oriented requests.
Then run the Vercel Agent Readability audit against the docs entry point:
If the audit reports broken links from llms.txt, check whether your generated URLs use the production --base-url while your local audit expects localhost. For local dev, serve root and docs-scoped sitemap/robots files from the current request origin, or run the audit against a preview URL that matches --base-url.
Minimal checklist
- Generate
llms.txt, markdown mirrors,llms-full.txt, sitemap, robots, andagent-readability.json. - Serve root-level
/sitemap.xml,/sitemap.md, and/robots.txt. - Keep docs-scoped
/docs/sitemap.xml,/docs/sitemap.md, and/docs/robots.txt. - Add JSON-LD to every docs HTML page.
- Add canonical and markdown alternate links to every docs HTML page.
- Return markdown for agent Accept headers, AI user agents, and
.mdURLs. - Include
canonical_urlandlast_updatedin markdown frontmatter. - Return markdown bodies for missing docs pages requested by agents.