A six-node Mac Mini cluster, a governed boundary, and a 192GB Mac Studio — built to demonstrate one claim: governed AI is operationally viable at room scale.
The Forge makes one argument the rest of the industry is still hand-waving about. Not on a slide, not on a demo box — on six Mac Minis on a shelf and an M4 Pro the agent can see but cannot reach.
A single laptop running an agent in a Python script is not a governed system; it is a development convenience. The moment that agent reaches into anything that matters — your shell, your files, your network — the absence of structural separation between the model and the systems it operates becomes the central problem. Most labs duck this by running everything together and declaring the boundaries by convention. The Forge runs the governance boundary on a machine the cluster cannot deploy to. The cluster manages what the boundary enforces. The split is visible.
Agents propose. Platforms enforce. Everything in this lab is built around that asymmetry.
The platform: GitOps, observability, CI/CD, secrets, model registry, and the policy-management half of governance.
What the cluster does not do is run inference. That is deliberate — inference belongs on the next tier.
The part that makes this a governance system, not just a Kubernetes cluster. Every proposed action passes a pre-commit gate here before execution.
The agent runs on Tier I. It can submit proposals; it cannot reach this machine to modify what evaluates them.
High-capacity model serving, behind the gate. Not a cluster node. The model serves; the gate authorises; the evidence record closes the loop.
When the M5 Ultra ships, this tier upgrades and nothing above it changes. The platform abstracts the hardware.
Run governance entirely on the M4 Pro and the MLOps story never closes. Run it entirely inside K3s and separation of governance collapses. The Forge does neither: policy gets the operational benefits of platform residence; enforcement stays at a structural boundary the cluster cannot reach. The cluster pushes policy; the boundary applies it.
Read the full position paper — ADR-0006 →The platform applies the ISR framework — Intelligence, Surveillance, Reconnaissance — to observability. Each signal answers a different operational question; conflating them produces dashboards that look impressive and tell you nothing under pressure.
Reconstruct what happened in this session, end to end.
Are the known indicators still inside their tolerances? Page if not.
What's happening that nobody has defined a metric for yet? Investigate.
Telemetry runs on a dedicated node — worker-05 — and the isolation is deliberate. When any other node is misbehaving, telemetry has to stay functional and observable. Mixing observability with general workloads defeats its purpose under exactly the pressure when you need it most.
The three machines are not interchangeable. The design depends on what each one is actually good at.
The community discovered in 2005 what Apple never advertised: the thermal envelope makes it viable as always-on infrastructure. The six-node K3s cluster is that pattern taken seriously — real topology, real specialisation, on hardware that draws less than a gaming PC and makes no noise.
Unified memory removes the PCIe bottleneck; the Neural Engine accelerates inference on the same memory pool as the CPU. The result is a governance boundary that evaluates actions at 3.2ms median while running the models it governs. A measured result on production hardware, not a demo.
192GB of unified memory changes what local inference can do. A 70B model at Q4_K_M uses ~40GB; the remaining 150GB supports concurrent serving, long context, and multi-agent sessions. When the M5 Ultra ships, the tier upgrades and the governance layer above it does not change.
Six nodes live. The 30-day stability test is done. The first ADRs are written, including the central one on the governance split. The observability stack is deploying; Kyverno policies are about to land.
Harbor extended for model artifacts, Actions wired to ArgoCD, the seventh node provisioned, the ClawLaw proxy on worker-02, policy distribution to the boundary, and the full ALLOW/DENY/ESCALATE flow end to end — when the model lifecycle and the audit record become the same object.
The learning track is the on-ramp — eight modules from picking the hardware to closing the loop, each paired with a lab you can reproduce.