Build a Local Multi-Agentic RAG App in 7 Steps: Transformers.js, Strands, ONNX, Orama

Building a local multi-agentic RAG app used to be a lab curiosity. Over the last 18 months, WebGPU, Transformers.js v4.2.0's fused ONNX kernels, and ~500 MB OPFS quotas quietly crossed their production thresholds — and the full stack an agentic app needs (orchestrator SLM, embedder, reranker) now fits in a single browser tab. This post is a builder's guide to the reference architecture: a Next.js app that runs Qwen3.5-0.8B-Text, Nomic Embed v1.5, and bge-reranker-base entirely on the user's GPU, routes through Strands TS hierarchical sub-agents, and scopes retrieval per document via Orama's where pre-filter. Honest on the limits (cold start, small-model routing, OPFS quirks) and specific on the gotchas (Chrome's Cache API silently drops entries above ~200 MB).

0 Comments