SC0%
Back to Index
project

Herald

A local-first AI operating system using the CLLM pattern for deterministic, evidence-grounded factual accuracy. Orchestrates 10 specialist models with sequential VRAM management.

PythonPyTorchGGUFLocal LLMsVRAM OrchestrationElectronRust

Herald: Local-First AI OS

Herald is a revolutionary "CLLM" (Cognitive Large Language Model) pattern implementation that moves away from traditional LLM-centric agents toward a deterministic, evidence-grounded cascade.

"The goal of Herald was never just about offline AI, but about building a cognitive system that doesn't hallucinate because it's forced to prove every claim it makes."

The CLLM Pattern

Traditional agents are stochastic; you ask a question, and they attempt to answer it using only their internal training weights. Herald flips this by treating the LLM as an orchestration engine, not a knowledge base.

  1. Information Extraction: The system identifies key entities and claims.
  2. Context Retrieval: Local vector databases and RAG systems pull evidence.
  3. Reasoning Verification: A specialized model (often a reasoning-heavy model like Llama 3 70B) critiques the answer against the retrieved context.
  4. Final Synthesis: The user receives a cited, grounded response.

Sequential VRAM Management

One of the greatest challenges in local AI is the limited VRAM of consumer hardware. Herald solves this through an "Active Seat" system.

  • Only the model currently processing a task is fully loaded into VRAM.
  • Inactive models are swapped to system RAM (unified memory on macOS) or compressed using custom quantization techniques.
  • This allows a machine with only 24GB of VRAM to orchestrate a collective of models that would normally require 80GB+.

Tech Stack

  • Architecture: Skeptic OS / Esoteric Reference Implementation.
  • Languages: Python, TypeScript, Rust (for the VRAM orchestrator).
  • Complexity: 46k+ LOC across 171 modular components.
  • Optimization: Custom GGUF quantization and model-swapping drivers.

Future Exploration

We are currently exploring the integration of multimodal specialist models (vision and audio) to allow Herald to "see" the operating system environment it's managing.

https://heraldd.vercel.app/