The Forge — A governed Apple Silicon home lab

11W

Idle draw per node — the whole cluster runs on less power than one gaming PC

3.2ms

Median gate latency under real workload — P99 14.7ms, governance overhead invisible in practice

192GB

Unified memory on the Mac Studio — a 70B model fits in ~40GB with room to spare

100%

30-day uptime across all six nodes — zero unexpected etcd leader elections

01 · What the platform is for

Governed autonomy on real hardware.

The Forge makes one argument the rest of the industry is still hand-waving about. Not on a slide, not on a demo box — on six Mac Minis on a shelf and an M4 Pro the agent can see but cannot reach.

A single laptop running an agent in a Python script is not a governed system; it is a development convenience. The moment that agent reaches into anything that matters — your shell, your files, your network — the absence of structural separation between the model and the systems it operates becomes the central problem. Most labs duck this by running everything together and declaring the boundaries by convention. The Forge runs the governance boundary on a machine the cluster cannot deploy to. The cluster manages what the boundary enforces. The split is visible.

Agents propose. Platforms enforce. Everything in this lab is built around that asymmetry.

Three tiers, one platform

Tier I

The cluster

6 × Intel Mac Mini · K3s

The platform: GitOps, observability, CI/CD, secrets, model registry, and the policy-management half of governance.

Nodes1 control · 5 workers

OSUbuntu 22.04 LTS

Each nodeCore i5 · 16GB

Draws~8–11W idle

What the cluster does not do is run inference. That is deliberate — inference belongs on the next tier.

Tier II

The governed boundary

Mac Mini M4 Pro · 64GB

The part that makes this a governance system, not just a Kubernetes cluster. Every proposed action passes a pre-commit gate here before execution.

VerdictsALLOW · DENY · ESCALATE

Median3.2ms

P9914.7ms

RuntimeOllama · MLX · ANE

The agent runs on Tier I. It can submit proposals; it cannot reach this machine to modify what evaluates them.

Tier III

The inference substrate

Mac Studio · 192GB

High-capacity model serving, behind the gate. Not a cluster node. The model serves; the gate authorises; the evidence record closes the loop.

70B footprint~40GB used

Headroom150GB

Speed~38 tok/s

AccelNeural Engine

When the M5 Ultra ships, this tier upgrades and nothing above it changes. The platform abstracts the hardware.

03 · The central decision

Policy management inside the cluster. Enforcement at the boundary.

Run governance entirely on the M4 Pro and the MLOps story never closes. Run it entirely inside K3s and separation of governance collapses. The Forge does neither: policy gets the operational benefits of platform residence; enforcement stays at a structural boundary the cluster cannot reach. The cluster pushes policy; the boundary applies it.

Read the full position paper — ADR-0006 →

04 · Observability is not metrics

Three signals. Three questions.

The platform applies the ISR framework — Intelligence, Surveillance, Reconnaissance — to observability. Each signal answers a different operational question; conflating them produces dashboards that look impressive and tell you nothing under pressure.

Intelligence

Traces

Tempo · OpenTelemetry

Reconstruct what happened in this session, end to end.

Surveillance

Metrics

Prometheus · Grafana

Are the known indicators still inside their tolerances? Page if not.

Reconnaissance

Logs

Loki

What's happening that nobody has defined a metric for yet? Investigate.

Telemetry runs on a dedicated node — worker-05 — and the isolation is deliberate. When any other node is misbehaving, telemetry has to stay functional and observable. Mixing observability with general workloads defeats its purpose under exactly the pressure when you need it most.

The hardware argument

The three machines are not interchangeable. The design depends on what each one is actually good at.

Intel Mac Mini

the accidental server

The community discovered in 2005 what Apple never advertised: the thermal envelope makes it viable as always-on infrastructure. The six-node K3s cluster is that pattern taken seriously — real topology, real specialisation, on hardware that draws less than a gaming PC and makes no noise.

M4 Pro Mac Mini

the governed edge

Unified memory removes the PCIe bottleneck; the Neural Engine accelerates inference on the same memory pool as the CPU. The result is a governance boundary that evaluates actions at 3.2ms median while running the models it governs. A measured result on production hardware, not a demo.

Mac Studio

the inference substrate

192GB of unified memory changes what local inference can do. A 70B model at Q4_K_M uses ~40GB; the remaining 150GB supports concurrent serving, long context, and multi-agent sessions. When the M5 Ultra ships, the tier upgrades and the governance layer above it does not change.

What is shipped, what is next

Shipped · Phase 1

The cluster is operational.

Six nodes live. The 30-day stability test is done. The first ADRs are written, including the central one on the governance split. The observability stack is deploying; Kyverno policies are about to land.

Next · MLOps + governance

Close the loop.

Harbor extended for model artifacts, Actions wired to ArgoCD, the seventh node provisioned, the ClawLaw proxy on worker-02, policy distribution to the boundary, and the full ALLOW/DENY/ESCALATE flow end to end — when the model lifecycle and the audit record become the same object.

Build your own

The learning track is the on-ramp — eight modules from picking the hardware to closing the loop, each paired with a lab you can reproduce.

The Forge.

Governed autonomy on real hardware.

Policy management inside the cluster. Enforcement at the boundary.

Three signals. Three questions.