Optimize docs for agents

Use this guide when you already have a docs site and want agents to find, fetch, attribute, and cite the same content humans read in the browser.

Leadtype handles the generated files. Your app wires those files into routing and HTML.

The default output shape is based on the repo's agent evals. See Evals for the benchmark summary and the open question around larger-corpus llms-full.txt scaling.

What good looks like

An agent-readable docs site has four layers:

/llms.txt, /sitemap.xml, /sitemap.md, and /robots.txt tell agents what exists and where to start.

Each docs page has a markdown mirror at /docs/page.md, and agent requests to /docs/page can receive markdown instead of HTML.

Human HTML pages include JSON-LD, canonical links, and markdown alternate links so agents can extract page identity without guessing from the DOM.

Markdown responses include canonical_url and last_updated frontmatter so copied content keeps its source and freshness.

1. Generate the artifacts

Run the site-mode pipeline before your app build:

npx leadtype generate \
  --src . \
  --out public \
  --base-url https://leadtype.dev \
  --name "My product" \
  --summary "One sentence about the product."

This writes the docs-scoped files under public/docs/ and the top-level public/llms.txt:

public/
├── llms.txt
├── llms-full.txt
└── docs/
    ├── index.md
    ├── quickstart.md
    ├── llms.txt
    ├── sitemap.xml
    ├── sitemap.md
    ├── robots.txt
    └── agent-readability.json

The generated agent-readability.json manifest is the bridge between build-time content and runtime requests. It contains page URLs, markdown mirror paths, titles, descriptions, group navigation, and freshness dates.

2. Add one middleware

Put the agent-readable routes before your HTML docs route. This Node/Bun example handles root discovery files, docs-scoped discovery files, direct .md URLs, and Accept: text/markdown requests in one place:

import { readFile } from 'node:fs/promises';
import { join } from 'node:path';
import manifestJson from '../public/docs/agent-readability.json' with { type: 'json' };
import {
  createAgentMarkdownResponse,
  createRobotsTxtResponse,
  createSitemapMarkdownResponse,
  createSitemapXmlResponse,
  type AgentReadabilityManifest,
  type MarkdownMirrorTarget,
} from 'leadtype/llm/readability';

const manifest = {
  ...manifestJson,
  version: 1,
} as AgentReadabilityManifest;

async function readMarkdownFile(
  target: MarkdownMirrorTarget
): Promise<string | null> {
  try {
    return await readFile(
      join(process.cwd(), 'public', target.filePath),
      'utf8'
    );
  } catch (error) {
    if (
      typeof error === 'object' &&
      error !== null &&
      'code' in error &&
      (error.code === 'ENOENT' || error.code === 'ENOTDIR')
    ) {
      return null;
    }
    throw error;
  }
}

export async function handleDocsRequest(
  request: Request
): Promise<Response | null> {
  if (request.method !== 'GET' && request.method !== 'HEAD') {
    return null;
  }

  const url = new URL(request.url);
  const requestOrigin = url.origin;

  switch (url.pathname) {
    case '/sitemap.xml':
    case '/docs/sitemap.xml':
      return createSitemapXmlResponse({ manifest, requestOrigin });
    case '/sitemap.md':
    case '/docs/sitemap.md':
      return createSitemapMarkdownResponse({ manifest, requestOrigin });
    case '/robots.txt':
      return createRobotsTxtResponse({ manifest, requestOrigin });
    case '/docs/robots.txt':
      return createRobotsTxtResponse({
        manifest,
        requestOrigin,
        sitemapUrlPath: '/docs/sitemap.xml',
      });
    default:
      return createAgentMarkdownResponse({
        urlPath: url.pathname,
        method: request.method,
        headers: Object.fromEntries(request.headers),
        manifest,
        readMarkdownFile,
        requestOrigin,
      });
  }
}

If you also have marketing, blog, changelog, or product pages, pass them through the optional pages field — the regenerator merges them into the rebased output:

return createSitemapXmlResponse({
  manifest,
  requestOrigin,
  pages: [...manifest.pages, ...marketingPages, ...blogPages],
});

The other generated artifacts — /llms.txt, /docs/llms.txt, /llms-full.txt, /docs/agent-readability.json — use root-relative links and serve fine as static files straight from public/.

Keep the docs-scoped versions too (/docs/sitemap.xml etc.). Audits and agents may request both /sitemap.xml and /docs/sitemap.xml, especially when the audited URL is /docs.

3. Add JSON-LD to docs pages

Use the manifest entry for the current page and render Schema.org JSON-LD into the HTML head:

import agentManifest from '../public/docs/agent-readability.json';
import { renderJsonLd, renderJsonLdScript } from 'leadtype/llm/readability';

const page = agentManifest.pages.find(
  (entry) => entry.urlPath === '/docs/quickstart'
);

if (page) {
  const jsonLd = renderJsonLd(page, agentManifest);
  const script = renderJsonLdScript(page, agentManifest);
}

Use renderJsonLd(page, manifest) if your framework has a typed metadata API. Use renderJsonLdScript(page, manifest) if your framework expects an HTML string.

Also add canonical and markdown alternate links:

<link rel="canonical" href="https://leadtype.dev/docs/quickstart" />
<link
  rel="alternate"
  type="text/markdown"
  href="https://leadtype.dev/docs/quickstart.md"
/>

The JSON-LD gives agents the page title, description, canonical URL, last modified date, and breadcrumbs without scraping your rendered layout.

4. Return markdown to agents

The middleware above uses createAgentMarkdownResponse. It returns a Web Response (or null when the path is not an agent-oriented markdown request) and handles:

Accept: text/markdown and Accept: text/plain content negotiation (q-values respected).
Known AI user-agent headers (GPTBot, ClaudeBot, Bingbot, AmazonBot, MetaExternalAgent, PerplexityBot, MistralBot, AppleBot, ByteSpider, YouBot, …).
Direct .md URLs such as /docs/quickstart.md.
canonical_url and last_updated frontmatter aliases injected automatically.
200 markdown responses for missing docs pages, so agents do not discard the body.
Content-Type: text/markdown; charset=utf-8, Vary: Accept[, User-Agent], Link: <…>; rel="canonical", Cache-Control: public, max-age=300, must-revalidate.

readMarkdownFile may be sync or async. In Node/Bun, read from disk. In Cloudflare, fetch from KV/R2 or an asset binding. In Vercel Edge, fetch from the deployment's static asset URL.

Put that logic wherever your framework can intercept docs requests before its HTML route:

Framework/runtime	Where it usually goes
TanStack Start / nitro	`server/middleware/agent-readability.ts` (h3). One middleware handles both the markdown response and the sitemap/robots regenerators — runs in dev, preview, and prod. See `apps/example/server/middleware/agent-readability.ts` for the canonical reference.
Nuxt	`server/middleware/agent-readability.ts` (h3) — same shape as the TanStack Start example.
Next.js	`middleware.ts` (Edge) or a catch-all route handler before the docs page.
Astro	An endpoint at `pages/docs/[...slug].md.ts` or `astro:middleware`.
Cloudflare Workers/Pages	Worker `fetch` handler with KV/R2 asset binding for the markdown reader.
Express/Hono/Fastify	Middleware before the docs HTML route.

Tip: if you keep static sitemap.xml / sitemap.md / robots.txt files in your build output, your framework's static handler may serve them before your middleware can rebase URLs to the live origin. Either delete the static copies after the build (so the middleware always runs) or make sure your middleware is registered ahead of static-asset serving.

Do not rewrite llms.txt, sitemap.xml, sitemap.md, robots.txt, llms-full.txt, or agent-readability.json to page markdown. The helper leaves those artifact paths alone.

Why the sitemap and robots responses are regenerated, not static

sitemap.xml's <loc> requires absolute URLs, and robots.txt's Sitemap: directive is conventionally absolute too. So those two files cannot be plain static assets if you want previews and staging to advertise the right origin. The middleware in step 2 already calls createSitemapXmlResponse, createSitemapMarkdownResponse, and createRobotsTxtResponse — each rebuilds from the manifest using the live requestOrigin, so preview, staging, and prod all advertise the right URLs without per-environment config.

The other generated files (/llms.txt, /docs/llms.txt, /llms-full.txt) use root-relative links, so they can be served as plain static assets — no per-request rewriting needed.

Cache-Control and CDN

Every helper adds Cache-Control: public, max-age=300, must-revalidate by default. Pair it with the Vary: Accept, User-Agent header that createAgentMarkdownResponse sets when an AI user-agent is detected — your CDN must shard cache entries on those headers, or it will serve HTML to agents from a markdown-cached entry (or vice versa). Override the default with cacheControl: "<directive>", or pass cacheControl: null to omit the header entirely (useful when your CDN sets caching out-of-band).

5. Verify locally

Start the docs site, then check the URLs agents use:

curl http://localhost:5173/llms.txt
curl http://localhost:5173/sitemap.xml
curl http://localhost:5173/sitemap.md
curl http://localhost:5173/docs/sitemap.xml
curl -I -H "Accept: text/markdown" http://localhost:5173/docs/quickstart
curl http://localhost:5173/docs/quickstart.md
curl -H "User-Agent: ChatGPT-User" http://localhost:5173/docs/quickstart

Expected results:

llms.txt links point to existing .md URLs.
Both root and docs-scoped sitemaps resolve.
Markdown responses use Content-Type: text/markdown; charset=utf-8.
Markdown frontmatter contains canonical_url and last_updated.
HTML pages contain one application/ld+json script.
Missing docs pages return a markdown body for agent-oriented requests.

Then run the Vercel Agent Readability audit against the docs entry point:

npx @vercel/agent-readability audit http://localhost:5173/docs

If the audit reports broken links from llms.txt, check whether your generated URLs use the production --base-url while your local audit expects localhost. For local dev, serve root and docs-scoped sitemap/robots files from the current request origin, or run the audit against a preview URL that matches --base-url.

Minimal checklist

Generate llms.txt, markdown mirrors, llms-full.txt, sitemap, robots, and agent-readability.json.
Serve root-level /sitemap.xml, /sitemap.md, and /robots.txt.
Keep docs-scoped /docs/sitemap.xml, /docs/sitemap.md, and /docs/robots.txt.
Add JSON-LD to every docs HTML page.
Add canonical and markdown alternate links to every docs HTML page.
Return markdown for agent Accept headers, AI user agents, and .md URLs.
Include canonical_url and last_updated in markdown frontmatter.
Return markdown bodies for missing docs pages requested by agents.