Herald / Jarvis: Local Assistant Runtime

Herald is the architecture and Jarvis is the terminal/voice-facing implementation path. The public source shows the pieces that matter: named routing stages, a CRSIS runtime rule engine, local voice I/O, reliability guards, and explicit limits around what the local stack can and cannot do.

The important part is the inversion: the LLM is not treated as the single brain making every decision. Routing, admission, tool selection, evidence collection, and reliability checks stay visible in code; the model renders inside those constraints.

Problem

Most agent demos ask the viewer to trust whatever the model says. Herald points in the other direction: route deterministically, keep tool traces visible, separate facts from rendering, and make failure modes part of the system instead of hiding them.

That matters because local hardware has a real ceiling. A small local coder model can handle bounded edits, small scripts, targeted bug fixes, and code questions when the context is controlled. It cannot reliably act like a flagship hosted model across hard multi-file engineering tasks without help.

What I Built

5-stage routing cascade: Exact Match -> Strict Deterministic -> Soft Deterministic -> Semantic -> Classifier.
CRSIS runtime engine: A live rule system with performance monitoring, failure-pattern detection, AST-safe modification paths, approval gates, satisfaction checks, versioning, and rollback.
Voice pipeline: Faster-Whisper STT, Kokoro ONNX TTS, and Ollama LLM routing on a single consumer GPU with VRAM relay management.
Reliability layer: Latency budget, admission control, route cache, and network guard modules instead of trusting the model path alone.
Discord addon: Voice/text bridge, permission mapping, and loop-prevention work for a real external interface.
Open-source release: The project is public because the architecture and implementation lessons are more valuable when the limits are visible.

CRSIS

CRSIS is a live rule engine that monitors routing performance, detects failure patterns, and proposes AST-safe code modifications with approval gates and rollback. The system can improve its routing rules while keeping source changes reviewable and recoverable.

That is the unusual part of this project. It is not just a chat wrapper around Ollama; it is a runtime self-improvement loop with analyzer, proposer, code modifier, validator, applier, satisfaction detector, approval, and rollback components.

Source Metrics

51,780 lines of Python source excluding test artifacts.
640 test functions.
317 Python source files across brain_core, CRSIS, addons, and install-pack.
5-stage routing cascade with named stages instead of a vague "agent chooses a tool" path.
Reliability modules for latency budget, admission control, network guard, and route caching.

Known Gaps

Known: 252 Pyright type errors remain - documented in docs/audit_report.md. Most are dynamic dict access patterns and Windows-only ctypes imports; architectural fixes are deferred pending cross-platform redesign.

Artifact Notes

The screenshots show weird replacement glyphs in some terminal output. That came from a tool compiling and encoding issue while I was working through schema/tool fixes. It is documented as part of the project because debugging that plumbing is exactly what local agent work becomes: tool descriptions, serialization, encoding, subprocess behavior, and model confusion all matter.

Constraints

Local inference keeps control and privacy closer to the machine, but the model ceiling is real.
Sequential/local execution has to care about VRAM, context size, tool cost, and model specialization.
A small coder model can generate or edit local files, but it cannot reliably plan large multi-file repairs.
Better autonomy would require a fallback path to a stronger API model for hard tasks.
Cross-platform cleanup is still needed because some implementation paths are Windows-first.
The best product direction would be a narrow niche where local, private, voice-first operation is actually the right interface.

Tradeoffs

I dropped the idea of selling this as a general voice assistant. There is no moat in "another assistant," and the codebase would need a major simplification pass before it should become a focused product.

The more honest direction is to keep it open source, use it as proof of voice and tool orchestration work, and explain the ceiling clearly.

Proof

Public source: github.com/bibbisalsd/heraldproj.
Public Herald site: architecture, pattern, model seats, world model, CRSIS, and roadmap.
Source-visible proof: routing cascade, CRSIS runtime loop, voice pipeline, reliability modules, and Discord bridge.
Limit documentation: local model capability boundary, audit report, and API-fallback direction.
Project status: open-source architecture and implementation artifact, not a finished commercial assistant.

https://heraldd.vercel.app/

Herald / Jarvis

A local assistant architecture where routing, reliability, voice, and self-improvement stay visible in source.

Herald / Jarvis source proof