Posted on 9 mins read
tl;dr: server‑side only: a zola template for ui, a cloudflare pages function for retrieval + generation, a tiny build‑time tf‑idf index, strict csp, and no client javascript.

I wanted an AI chat bubble on this site without breaking my core rule: no client‑side JavaScript. No trackers, no embedded widgets, no megabytes of model weights downloading into your browser. Just HTML and CSS, rendered server‑side, with the answers grounded in the content you see here.

This post documents exactly how I built it: a server‑rendered “ask” page backed by Cloudflare Pages Functions and Workers AI, using a tiny build‑time retrieval index so answers quote the site’s own posts. The result feels like a chat, but it’s actually a simple HTML form. Every message is a page request; the server does the work; you see the answer. Simple, fast, cache‑friendly.

The goals (and constraints)

  • no client javascript: keep the site lightweight and privacy‑respecting.
  • ground answers in my content: retrieval‑augmented generation (RAG), but without shipping a vector DB to the client.
  • work within the zola + cloudflare pages stack: minimal moving parts.
  • responsive and accessible: keyboard‑friendly, enter‑to‑submit, clean typography.

High‑level architecture

At a glance, the system has three pieces:

  1. UI template: A normal Zola page at /ask, rendered with templates/ask.html. It’s just a <form method="post"> plus a few empty containers where the server will inject the answer and sources.
  2. server function: A Cloudflare Pages Function at functions/ask.ts that handles GET/POST. On POST it performs retrieval over a compact build‑time index and calls Workers AI (e.g., @cf/meta/llama-3.1-8b-instruct) to draft an answer constrained by the retrieved context.
  3. build‑time index: A tiny TF‑IDF index of my content, produced by scripts/build_rag_index.py into static/rag-index.json. No database, no runtime fetches to third‑party stores.

Why TF‑IDF and not embeddings? Because it’s small, transparent, and good enough for a single‑author site. The index is cosine‑normalized sparse vectors; the function computes a query vector for your question, ranks pages, and provides the top snippets as context. If I ever outgrow it, swapping in Cloudflare Vectorize and an embedding model would be straightforward.


UI: a Zola page that looks like the rest of the site

I refuse to bolt on an iframe with its own styles and font soup. The /ask page uses the same layout as posts:

  • templates/ask.html extends base.html, includes the standard header/nav/footer, and renders a minimal form.
  • A few small styles in sass/parts/_misc.scss modernize the input while keeping the site’s typography.
  • There’s no script tag. Enter submits the form; the function returns a fresh page with the answer injected.

Key UI choices:

  • single page submit: Every turn is a POST. This keeps the interaction stateless, debuggable, and cache‑aware.
  • textarea, not contentEditable: predictable styling, reliable in Safari and friends.
  • no spinner gifs: the server answers quickly; the browser does what it’s good at—rendering HTML.

The function: retrieval + generation, server‑side

functions/ask.ts handles two paths:

  • GET /ask: serves the pre‑rendered public/ask/index.html from the ASSETS binding so we don’t loop back through routing. This provides a fast, cached, static shell.
  • POST /ask: reads the form, loads static/rag-index.json, ranks documents with cosine similarity, builds a concise context, prompts Workers AI, and injects the answer and sources back into the same template.

On the injection side, the function uses id‑based replacements (regex keyed on #ask-answer and #ask-sources) so minified HTML doesn’t break the integration. No brittle innerHTML guesses, no client scripts.

The prompt keeps the model honest: “answer using only the information in the provided context. if the context is insufficient, say you do not know.” That keeps the response grounded and short, and it makes failure modes explicit rather than hallucinatory.

Takeaway: do server‑side retrieval and injection, not client‑side hydration. It’s faster, simpler, and auditably private.


Retrieval without a vector database

The build step converts Markdown under content/ into a compact sparse index. The script does three jobs:

  1. read .md files, strip front‑matter and markup, and extract a title + plain‑text preview.
  2. tokenize, compute document frequencies, then TF‑IDF per document.
  3. l2‑normalize vectors and write an index with idf, documents[], and a short preview per page.

On query, the function builds a TF‑IDF vector for the question, scores with a dot product against each document vector (cosine similarity thanks to normalization), and takes the top K. It then composes a context block with titles, URLs, and excerpts.

Why this approach?

  • small: the entire index is a few hundred kilobytes, not tens of megabytes.
  • deterministic: the math is transparent; ranking is explainable.
  • zero infra: no runtime database, no warmups, no migrations.

If you prefer embeddings, swap the offline step for an embedding model call, push vectors into Cloudflare Vectorize, and query it on POST. The Pages Function flow is identical.


Making it feel native (and keeping it JS‑free)

I tuned the UX to feel like the rest of the site:

  • The ask page uses the same header, nav, and footer. The only UI difference is a form and two content blocks where the answer and sources appear.
  • The textarea is Safari‑friendly (no native appearance quirks), vertically resizable, and honors the site’s font stack.
  • Enter submits the form; no client handlers required.

The important bit is identity: the AI is a guest in the site’s house, not a third‑party takeover. Typography, spacing, and colors are consistent because we rely on the site’s CSS.


Security and privacy choices

I’d rather spend time removing risk than debugging trackers.

  • Strict CSP: in static/_headers I keep script-src 'none' for the site and allow only form-action 'self'. This prevents surprise scripts and ensures the ask form posts back to the origin only.
  • No client network calls: there’s no fetch or WebSocket on the client. All requests are plain HTML form submits.
  • Bot/WAF rules: Cloudflare’s managed rules sometimes dislike POSTs to new paths. One skip rule for /ask (WAF managed/bot fight) solved it, paired with a gentle rate limit if you expect traffic spikes.
  • No personal data: the function doesn’t store questions; there’s no analytics JavaScript. Logs live in your Cloudflare account with normal retention.

If you do enable embeddings/Vectorize later, keep the index public‑read on your site (like this TF‑IDF file) or private in Vectorize—whichever matches your threat model. The flow still stays server‑side.


The build pipeline: always‑fresh retrieval

RAG is only as good as its corpus. I added a tiny pre‑commit hook so the index stays current:

  • scripts/hooks/pre-commit runs scripts/build_rag_index.py and git add static/rag-index.json automatically.
  • Cloudflare Pages then publishes the fresh rag-index.json alongside the site.

This keeps deploys deterministic. You can swap the hook for CI if you prefer, but keeping it local means the repo snapshot always contains the exact index you deployed. No hidden state.


Why I didn’t ship a browser LLM

I evaluated three patterns before landing here:

  • in‑browser llm via webgpu/wasm (webllm/transformers.js): private and offline, but first open costs 100–400 MB of weights. That’s hostile to the site’s size budget and a bad experience on slow links.
  • local backend via ollama: fast and fully local—for people who have it installed. For a public site, that’s a niche.
  • server‑side generation (what I chose): tiny page weight, no client scripts, predictable costs, and one place to reason about security.

The last option aligned with the site’s ethos: simple, inspectable, fast.


Deployment on Cloudflare Pages

Setup is straightforward:

  1. Ensure your build emits to public/ (Zola’s default). You should see public/ask/index.html and public/rag-index.json locally after zola build.
  2. Enable Functions for your Pages project and provide the AI binding named AI (this maps to Workers AI).
  3. Keep the ASSETS binding available (Pages provides it), so the function can read built files without routing loops.
  4. Confirm static/_headers carries strict CSP and form-action 'self'.
  5. Deploy. Test /ask on the dev domain and production domain—tweak WAF if POSTs are blocked.

That’s it. There’s no queue, no KV, no external datastore. Pages serves the shell and static index; the function does RAG and calls the model.


Failure modes and how I guardrail them

No system is perfect; here’s what can go sideways and how I addressed it.

  • Minified HTML broke naive string replacement. I switched to id‑based regex replacements targeting #question, #ask-answer, and #ask-sources. This is resilient to minification and attribute reordering.
  • Safari text controls looked off. I removed native appearance, increased padding, and let the textarea resize vertically. It now looks the same across browsers.
  • WAF disliked early POSTs. I added a skip rule for /ask (managed/bot fight) and a modest rate limit. The dev domain helped confirm function logic while I tuned production rules.
  • Model output sometimes trailed with “citations: …”. I now trim that pattern server‑side and present the curated sources list instead.
  • Retrieval recall. TF‑IDF works well for my corpus; if you have thousands of posts, move to embeddings + Vectorize.

The result is stable and easy to reason about. There are no client caches to invalidate, no service workers, and no hydration edge cases.


What it costs

Costs are predictable:

  • Workers AI: billed per 1K tokens (pricing varies by model). My prompts are small, and answers are concise. This keeps numbers low.
  • Pages Functions: modest runtime for HTML form posts.
  • No egress for heavy assets: there’s no shipping of model weights to the client.

Put differently: I pay per answer, not per visitor. The static pages remain free to browse at CDN speed.


If you want to replicate this

Here’s the checklist I’d use if I were starting from scratch:

  1. Build your site with Zola (or your static generator of choice). Add an /ask page that renders a standard HTML form.
  2. Write a function to handle GET/POST. On POST: load your corpus index, retrieve top documents, craft a cautious system prompt, call Workers AI, and inject the answer into the same HTML.
  3. Generate a build‑time index of your content. Start with TF‑IDF; you can upgrade to embeddings later.
  4. Lock down your CSP. Keep script-src 'none'. Allow only form-action 'self'.
  5. Add a pre‑commit hook (or CI job) to keep the index fresh whenever content changes.
  6. Test against your dev domain, then tune WAF for production.

If your priorities mirror mine—performance, privacy, readability—this design is hard to beat.


Summary

I added an AI‑powered “ask” page to a zero‑JavaScript site without compromising on speed or privacy. The trick wasn’t a fancy frontend; it was leaning into the strengths of the platform:

  • Zola renders a clean, theme‑consistent page.
  • Cloudflare Pages Functions perform retrieval and call Workers AI.
  • A tiny TF‑IDF index keeps answers grounded in the site’s content.
  • Strict CSP and no client JS mean the experience stays fast and private.

If you build websites for a living, you don’t have to choose between “modern” and “minimal.” You can have a site that loads fast, reads well, and still answers hard questions—without sending anyone’s browser on a weight‑lifting session.