---
title: Optimize docs for agents
description: >-
  Set up llms.txt, markdown mirrors, JSON-LD, sitemaps, robots.txt, and audit
  checks for an agent-readable docs site.
group: docs-site
lastModified: '2026-05-11T20:02:32-07:00'
lastAuthor: 'github-actions[bot]'
---
# Optimize docs for agents

Use this guide when you already have a docs site and want agents to find, fetch, attribute, and cite the same content humans read in the browser.

Leadtype handles the generated files. Your app wires those files into routing and HTML.

The default output shape is based on the repo's agent evals. See [Evals](/docs/reference/evals) for the benchmark summary and the open question around larger-corpus `llms-full.txt` scaling.

## What good looks like

An agent-readable docs site has four layers:

1. **Discovery files** `/llms.txt`, `/sitemap.xml`, `/sitemap.md`, and `/robots.txt` tell agents what exists and where to start.

2. **Markdown retrieval** Each docs page has a markdown mirror at `/docs/page.md`, and agent requests to `/docs/page` can receive markdown instead of HTML.

3. **Structured HTML metadata** Human HTML pages include JSON-LD, canonical links, and markdown alternate links so agents can extract page identity without guessing from the DOM.

4. **Attribution metadata** Markdown responses include `canonical_url` and `last_updated` frontmatter so copied content keeps its source and freshness.

## 1. Generate the artifacts

Run the site-mode pipeline before your app build:

```bash
npx leadtype generate \
  --src . \
  --out public \
  --base-url https://leadtype.dev \
  --name "My product" \
  --summary "One sentence about the product."
```

This writes the docs-scoped files under `public/docs/` and the top-level `public/llms.txt`:

```txt
public/
├── llms.txt
├── llms-full.txt
└── docs/
    ├── index.md
    ├── quickstart.md
    ├── llms.txt
    ├── sitemap.xml
    ├── sitemap.md
    ├── robots.txt
    └── agent-readability.json
```

The generated `agent-readability.json` manifest is the bridge between build-time content and runtime requests. It contains page URLs, markdown mirror paths, titles, descriptions, group navigation, and freshness dates.

## 2. Add one middleware

Put the agent-readable routes before your HTML docs route. This Node/Bun example handles root discovery files, docs-scoped discovery files, direct `.md` URLs, and `Accept: text/markdown` requests in one place:

```ts
import { readFile } from "node:fs/promises";
import { join } from "node:path";
import manifestJson from "../public/docs/agent-readability.json" with {
  type: "json",
};
import {
  createAgentMarkdownResponse,
  createRobotsTxtResponse,
  createSitemapMarkdownResponse,
  createSitemapXmlResponse,
  type AgentReadabilityManifest,
  type MarkdownMirrorTarget,
} from "leadtype/llm/readability";

const manifest = {
  ...manifestJson,
  version: 1,
} as AgentReadabilityManifest;

async function readMarkdownFile(
  target: MarkdownMirrorTarget
): Promise<string | null> {
  try {
    return await readFile(join(process.cwd(), "public", target.filePath), "utf8");
  } catch (error) {
    if (
      typeof error === "object" &&
      error !== null &&
      "code" in error &&
      (error.code === "ENOENT" || error.code === "ENOTDIR")
    ) {
      return null;
    }
    throw error;
  }
}

export async function handleDocsRequest(
  request: Request
): Promise<Response | null> {
  if (request.method !== "GET" && request.method !== "HEAD") {
    return null;
  }

  const url = new URL(request.url);
  const requestOrigin = url.origin;

  switch (url.pathname) {
    case "/sitemap.xml":
    case "/docs/sitemap.xml":
      return createSitemapXmlResponse({ manifest, requestOrigin });
    case "/sitemap.md":
    case "/docs/sitemap.md":
      return createSitemapMarkdownResponse({ manifest, requestOrigin });
    case "/robots.txt":
      return createRobotsTxtResponse({ manifest, requestOrigin });
    case "/docs/robots.txt":
      return createRobotsTxtResponse({
        manifest,
        requestOrigin,
        sitemapUrlPath: "/docs/sitemap.xml",
      });
    default:
      return createAgentMarkdownResponse({
        urlPath: url.pathname,
        method: request.method,
        headers: Object.fromEntries(request.headers),
        manifest,
        readMarkdownFile,
        requestOrigin,
      });
  }
}
```

If you also have marketing, blog, changelog, or product pages, pass them through the optional `pages` field — the regenerator merges them into the rebased output:

```ts
return createSitemapXmlResponse({
  manifest,
  requestOrigin,
  pages: [...manifest.pages, ...marketingPages, ...blogPages],
});
```

The other generated artifacts — `/llms.txt`, `/docs/llms.txt`, `/llms-full.txt`, `/docs/agent-readability.json` — use root-relative links and serve fine as static files straight from `public/`.

Keep the docs-scoped versions too (`/docs/sitemap.xml` etc.). Audits and agents may request both `/sitemap.xml` and `/docs/sitemap.xml`, especially when the audited URL is `/docs`.

## 3. Add JSON-LD to docs pages

Use the manifest entry for the current page and render Schema.org JSON-LD into the HTML head:

```ts
import agentManifest from "../public/docs/agent-readability.json";
import { renderJsonLd, renderJsonLdScript } from "leadtype/llm/readability";

const page = agentManifest.pages.find(
  (entry) => entry.urlPath === "/docs/quickstart"
);

if (page) {
  const jsonLd = renderJsonLd(page, agentManifest);
  const script = renderJsonLdScript(page, agentManifest);
}
```

Use `renderJsonLd(page, manifest)` if your framework has a typed metadata API. Use `renderJsonLdScript(page, manifest)` if your framework expects an HTML string.

Also add canonical and markdown alternate links:

```html
<link rel="canonical" href="https://leadtype.dev/docs/quickstart" />
<link
  rel="alternate"
  type="text/markdown"
  href="https://leadtype.dev/docs/quickstart.md"
/>
```

The JSON-LD gives agents the page title, description, canonical URL, last modified date, and breadcrumbs without scraping your rendered layout.

## 4. Return markdown to agents

The middleware above uses `createAgentMarkdownResponse`. It returns a Web `Response` (or `null` when the path is not an agent-oriented markdown request) and handles:

* `Accept: text/markdown` and `Accept: text/plain` content negotiation (q-values respected).
* Known AI user-agent headers (GPTBot, ClaudeBot, Bingbot, AmazonBot, MetaExternalAgent, PerplexityBot, MistralBot, AppleBot, ByteSpider, YouBot, …).
* Direct `.md` URLs such as `/docs/quickstart.md`.
* `canonical_url` and `last_updated` frontmatter aliases injected automatically.
* 200 markdown responses for missing docs pages, so agents do not discard the body.
* `Content-Type: text/markdown; charset=utf-8`, `Vary: Accept[, User-Agent]`, `Link: <…>; rel="canonical"`, `Cache-Control: public, max-age=300, must-revalidate`.

`readMarkdownFile` may be sync or async. In Node/Bun, read from disk. In Cloudflare, fetch from KV/R2 or an asset binding. In Vercel Edge, fetch from the deployment's static asset URL.

Put that logic wherever your framework can intercept docs requests before its HTML route:

|Framework/runtime|Where it usually goes|
|--|--|
|TanStack Start / nitro|`server/middleware/agent-readability.ts` (h3). One middleware handles both the markdown response and the sitemap/robots regenerators — runs in dev, preview, and prod. See [`apps/example/server/middleware/agent-readability.ts`](https://github.com/inthhq/leadtype/blob/main/apps/example/server/middleware/agent-readability.ts) for the canonical reference.|
|Nuxt|`server/middleware/agent-readability.ts` (h3) — same shape as the TanStack Start example.|
|Next.js|`middleware.ts` (Edge) or a catch-all route handler before the docs page.|
|Astro|An endpoint at `pages/docs/[...slug].md.ts` or `astro:middleware`.|
|Cloudflare Workers/Pages|Worker `fetch` handler with KV/R2 asset binding for the markdown reader.|
|Express/Hono/Fastify|Middleware before the docs HTML route.|

> **Tip:** if you keep static `sitemap.xml` / `sitemap.md` / `robots.txt` files in your build output, your framework's static handler may serve them before your middleware can rebase URLs to the live origin. Either delete the static copies after the build (so the middleware always runs) or make sure your middleware is registered ahead of static-asset serving.

Do not rewrite `llms.txt`, `sitemap.xml`, `sitemap.md`, `robots.txt`, `llms-full.txt`, or `agent-readability.json` to page markdown. The helper leaves those artifact paths alone.

### Why the sitemap and robots responses are regenerated, not static

`sitemap.xml`'s `<loc>` requires absolute URLs, and `robots.txt`'s `Sitemap:` directive is conventionally absolute too. So those two files cannot be plain static assets if you want previews and staging to advertise the right origin. The middleware in step 2 already calls `createSitemapXmlResponse`, `createSitemapMarkdownResponse`, and `createRobotsTxtResponse` — each rebuilds from the manifest using the live `requestOrigin`, so preview, staging, and prod all advertise the right URLs without per-environment config.

The other generated files (`/llms.txt`, `/docs/llms.txt`, `/llms-full.txt`) use root-relative links, so they can be served as plain static assets — no per-request rewriting needed.

### Cache-Control and CDN

Every helper adds `Cache-Control: public, max-age=300, must-revalidate` by default. Pair it with the `Vary: Accept, User-Agent` header that `createAgentMarkdownResponse` sets when an AI user-agent is detected — your CDN must shard cache entries on those headers, or it will serve HTML to agents from a markdown-cached entry (or vice versa). Override the default with `cacheControl: "<directive>"`, or pass `cacheControl: null` to omit the header entirely (useful when your CDN sets caching out-of-band).

## 5. Verify locally

Start the docs site, then check the URLs agents use:

```bash
curl http://localhost:5173/llms.txt
curl http://localhost:5173/sitemap.xml
curl http://localhost:5173/sitemap.md
curl http://localhost:5173/docs/sitemap.xml
curl -I -H "Accept: text/markdown" http://localhost:5173/docs/quickstart
curl http://localhost:5173/docs/quickstart.md
curl -H "User-Agent: ChatGPT-User" http://localhost:5173/docs/quickstart
```

Expected results:

* `llms.txt` links point to existing `.md` URLs.
* Both root and docs-scoped sitemaps resolve.
* Markdown responses use `Content-Type: text/markdown; charset=utf-8`.
* Markdown frontmatter contains `canonical_url` and `last_updated`.
* HTML pages contain one `application/ld+json` script.
* Missing docs pages return a markdown body for agent-oriented requests.

Then run the Vercel Agent Readability audit against the docs entry point:

```bash
npx @vercel/agent-readability audit http://localhost:5173/docs
```

If the audit reports broken links from `llms.txt`, check whether your generated URLs use the production `--base-url` while your local audit expects `localhost`. For local dev, serve root and docs-scoped sitemap/robots files from the current request origin, or run the audit against a preview URL that matches `--base-url`.

## Minimal checklist

* Generate `llms.txt`, markdown mirrors, `llms-full.txt`, sitemap, robots, and `agent-readability.json`.
* Serve root-level `/sitemap.xml`, `/sitemap.md`, and `/robots.txt`.
* Keep docs-scoped `/docs/sitemap.xml`, `/docs/sitemap.md`, and `/docs/robots.txt`.
* Add JSON-LD to every docs HTML page.
* Add canonical and markdown alternate links to every docs HTML page.
* Return markdown for agent Accept headers, AI user agents, and `.md` URLs.
* Include `canonical_url` and `last_updated` in markdown frontmatter.
* Return markdown bodies for missing docs pages requested by agents.
