Stream AI answers
Add this only after basic search works. Leadtype retrieves the most relevant chunks from the static index, builds a constrained prompt that tells the model to answer only from those chunks and cite them, then streams the response. The model never answers from memory.
The shape is the same across runtimes:
response is a plain-text streamed Response; sources is citation metadata
you render next to the answer, never inside the streamed text.
Pick your runtime
streamDocsAnswer ships per runtime. All take { index, content, query } and
return { response, sources }; they differ only in how you name the model.
Uses the ai package's streamText. Auth via your AI Gateway / provider env.
To bring your own model loop instead, call createAnswerContext(index, query, { content, productName }) for the { system, prompt, sources } and pass them to
any SDK yourself.
Build a hardened endpoint
Answer generation accepts user input and calls a paid model, so the route needs
guards. leadtype/search ships the building blocks; wire them into your
framework's request handler:
validateDocsQuery trims and caps the text, readJsonWithLimit rejects
oversized bodies before parsing, and getClientIdentifier reads common proxy IP
headers (cf-connecting-ip, x-forwarded-for, x-real-ip) for the limiter key.
Gate on credentials and return 503 when the provider isn't configured so the UI
can disable the feature.
The TanStack Start example wires exactly this for all three runtimes —
apps/tanstack/src/routes/api/docs/ask/{vercel,tanstack,cloudflare}.ts plus the
shared handler in lib/provider-answer.ts.
Consume the stream on the client
The endpoint returns a plain-text stream — read it incrementally and append:
Run a normal search first to show results (and sources) immediately, then stream
the answer. Use an AbortController so a new query cancels the in-flight one.
Verify
- The endpoint returns
429past the rate limit and503when credentials are missing. - A query streams text incrementally, not in one blob.
- The answer cites sources that match the retrieved chunks — and refuses when the docs don't cover the question.