Back to Index
project

Herald / Jarvis

Herald is a local assistant architecture built on a 5-stage routing cascade; the LLM is one renderer, not the decision-maker. Jarvis is the terminal and voice-facing implementation path with CRSIS, local voice I/O, and reliability guards across 51k+ LOC.

PythonLocal LLMsCRSISVoice PipelineReliability GuardsOpen Source
Local AI systems dossier

A local assistant architecture where routing, reliability, voice, and self-improvement stay visible in source.

51k+Python LOC
640test functions
5routing stages
Decision path

Exact, deterministic, semantic, and classifier routes are named stages rather than hidden model behavior.

CRSIS loop

Runtime rule proposals use analyzer, validator, approval, versioning, and rollback paths.

Operational ceiling

The writeup is clear about local model limits, Windows-first gaps, and where stronger API fallback would matter.

Proof Artifact

Herald / Jarvis source proof

Routing: Exact -> Strict Det -> Soft Det -> Semantic -> Classifier

Scale: 51k+ LOC, 640 tests, 317 Python modules across brain_core, CRSIS, addons, install-pack

CRSIS: live rule engine with AST-safe self-modification, satisfaction detection, approval gates, rollback

Voice: Faster-Whisper STT + Kokoro ONNX TTS + Ollama LLM on one consumer GPU

Reliability: latency budget, admission control, route cache, network guard

5-stage routingCRSIS rollback640 testsvoice pipeline
Scroll Down

Herald / Jarvis: Local Assistant Runtime

Herald is the architecture and Jarvis is the terminal/voice-facing implementation path. The public source shows the pieces that matter: named routing stages, a CRSIS runtime rule engine, local voice I/O, reliability guards, and explicit limits around what the local stack can and cannot do.

The important part is the inversion: the LLM is not treated as the single brain making every decision. Routing, admission, tool selection, evidence collection, and reliability checks stay visible in code; the model renders inside those constraints.

Problem

Most agent demos ask the viewer to trust whatever the model says. Herald points in the other direction: route deterministically, keep tool traces visible, separate facts from rendering, and make failure modes part of the system instead of hiding them.

That matters because local hardware has a real ceiling. A small local coder model can handle bounded edits, small scripts, targeted bug fixes, and code questions when the context is controlled. It cannot reliably act like a flagship hosted model across hard multi-file engineering tasks without help.

What I Built

  1. 5-stage routing cascade: Exact Match -> Strict Deterministic -> Soft Deterministic -> Semantic -> Classifier.
  2. CRSIS runtime engine: A live rule system with performance monitoring, failure-pattern detection, AST-safe modification paths, approval gates, satisfaction checks, versioning, and rollback.
  3. Voice pipeline: Faster-Whisper STT, Kokoro ONNX TTS, and Ollama LLM routing on a single consumer GPU with VRAM relay management.
  4. Reliability layer: Latency budget, admission control, route cache, and network guard modules instead of trusting the model path alone.
  5. Discord addon: Voice/text bridge, permission mapping, and loop-prevention work for a real external interface.
  6. Open-source release: The project is public because the architecture and implementation lessons are more valuable when the limits are visible.

CRSIS

CRSIS is a live rule engine that monitors routing performance, detects failure patterns, and proposes AST-safe code modifications with approval gates and rollback. The system can improve its routing rules while keeping source changes reviewable and recoverable.

That is the unusual part of this project. It is not just a chat wrapper around Ollama; it is a runtime self-improvement loop with analyzer, proposer, code modifier, validator, applier, satisfaction detector, approval, and rollback components.

Source Metrics

  • 51,780 lines of Python source excluding test artifacts.
  • 640 test functions.
  • 317 Python source files across brain_core, CRSIS, addons, and install-pack.
  • 5-stage routing cascade with named stages instead of a vague "agent chooses a tool" path.
  • Reliability modules for latency budget, admission control, network guard, and route caching.

Known Gaps

Known: 252 Pyright type errors remain - documented in docs/audit_report.md. Most are dynamic dict access patterns and Windows-only ctypes imports; architectural fixes are deferred pending cross-platform redesign.

Artifact Notes

The screenshots show weird replacement glyphs in some terminal output. That came from a tool compiling and encoding issue while I was working through schema/tool fixes. It is documented as part of the project because debugging that plumbing is exactly what local agent work becomes: tool descriptions, serialization, encoding, subprocess behavior, and model confusion all matter.

Constraints

  • Local inference keeps control and privacy closer to the machine, but the model ceiling is real.
  • Sequential/local execution has to care about VRAM, context size, tool cost, and model specialization.
  • A small coder model can generate or edit local files, but it cannot reliably plan large multi-file repairs.
  • Better autonomy would require a fallback path to a stronger API model for hard tasks.
  • Cross-platform cleanup is still needed because some implementation paths are Windows-first.
  • The best product direction would be a narrow niche where local, private, voice-first operation is actually the right interface.

Tradeoffs

I dropped the idea of selling this as a general voice assistant. There is no moat in "another assistant," and the codebase would need a major simplification pass before it should become a focused product.

The more honest direction is to keep it open source, use it as proof of voice and tool orchestration work, and explain the ceiling clearly.

Proof

  • Public source: github.com/bibbisalsd/heraldproj.
  • Public Herald site: architecture, pattern, model seats, world model, CRSIS, and roadmap.
  • Source-visible proof: routing cascade, CRSIS runtime loop, voice pipeline, reliability modules, and Discord bridge.
  • Limit documentation: local model capability boundary, audit report, and API-fallback direction.
  • Project status: open-source architecture and implementation artifact, not a finished commercial assistant.
https://heraldd.vercel.app/