A room-scale platform for DevOps, MLOps, and governed edge AI — six Intel Mac Minis running K3s, an M4 Pro as the enforcement boundary, and a Mac Studio as the inference substrate. The hardware is the argument. The governance architecture is the thesis.
The Forge runs the full progression. GitOps, observability, and CI/CD govern the platform. The same tools — extended — govern the model lifecycle. A third layer governs inference: intent, composition, evidence. That third layer is where ClawLaw lives.
Declarative infrastructure, GitOps operators, continuous observability, CI/CD. The control-plane pattern: govern complex systems through declared intent and continuous reconciliation.
Model registries, training job orchestration, inference serving, drift detection. The same GitOps operators that deploy services deploy models. Same observability stack, pointed at model health.
DevOps and MLOps govern structure. Neither governs intent. ClawLaw addresses the gap: intent evaluation, composition tracking, evidence chain, human escalation as a first-class policy outcome.
Governance gates highlighted amber · every inference action traverses the full chain
Tier I is the MLOps control plane. Tier II is the governed execution boundary. Tier III is the high-capacity inference substrate. Each tier has a distinct operational role. The boundaries between them are deliberate. Click any node to inspect it.
Single control node in the current configuration. Three-node HA control plane is a Phase 2 milestone. The single-node control plane is an accepted risk at Phase 1 — the cluster's primary role is platform development and operational learning, not production SLA enforcement.
Harbor is the center of gravity for the MLOps lifecycle — every model artifact, container image, and versioned weight file is promoted through this node. Harbor stores OCI artifacts, which means model weights and container images share the same registry, promotion pipeline, and access control model.
This node carries the policy management component of the governance architecture. The ClawLaw proxy manages policy rules, risk thresholds, escalation criteria, and routes enforcement decisions to the M4 Pro boundary. Policy management lives here — observable via GitOps, versioned in Harbor, monitored by the telemetry node. Policy enforcement lives on the M4 Pro.
The node that translates source control events into cluster state changes. GitHub Actions runs CI pipelines here. ArgoCD detects drift between the Git-declared state and live cluster state and reconciles. Kyverno validates manifests before admission. This node makes GitOps operational, not theoretical.
Persistent storage for structured data and unstructured artifacts: training datasets, evaluation results, experiment records, and the governance evidence records that close the audit loop. MinIO provides S3-compatible object storage, which means Velero can use it as a cluster backup target alongside its role in the MLOps artifact pipeline.
Dedicated observability node. Isolation is deliberate: when any other node is misbehaving, the telemetry node must remain functional and observable. Mixing observability with general workloads defeats its purpose under pressure. This node captures both infrastructure metrics and inference metrics — the ISR model applied to the same stack.
Without this node, inference is never a Kubernetes workload. The MLOps pipeline cannot close inside the cluster. With this node, training → registry → inference deployment → metrics → governance → evidence becomes a single operational loop managed by GitOps. Introduces heterogeneous scheduling — ARM64 inference workloads routed via node affinity and taints alongside x86 platform workloads.
Every proposed action passes through the ClawLaw pre-commit gate before execution. The agent cannot modify or bypass the layer governing it.
Physical separation preserves Agency Paradox criterion 01: the governance layer is structurally independent of the workloads it governs. An agent running inside K3s on Tier I operates on a different machine from the enforcement process evaluating its actions. The cluster cannot redeploy or modify the enforcement boundary.
High-capacity inference substrate. Not a cluster workload. Model serving runs behind the ClawLaw gate — inference requests arrive from Tier II on ALLOW, execute here, return through the evidence record.
The M5 Ultra is the natural next evolutionary step for this tier — same architecture, substantially increased capacity. The governance layer does not change when the hardware upgrades.
Moving inference inside K3s enables the full MLOps story but creates a governance question: if governance runs inside the same cluster as the agent it governs, separation of governance weakens. The resolution is a split — not a compromise.
Rules, risk thresholds, and version history are cluster services — GitOps-managed, ArgoCD-deployed, observable through the telemetry node. The Git commit log is the audit trail.
The pre-commit gate runs on hardware the cluster cannot reach as a workload. The cluster pushes policy; the boundary applies it. An agent inside K3s cannot modify what evaluates it.
Any governed AI system at scale needs this split: policy as a platform service, enforcement at a structural boundary, a verified channel between them. The full reasoning — options, trade-offs, and consequences — is the position paper.
Read ADR-0006 →The ISR intelligence model applies to both infrastructure and inference. Reconnaissance/Logs, Surveillance/Metrics, Intelligence/Traces — each signal answers a different operational question. Conflating them produces dashboards that look impressive and tell you nothing under pressure.
| Phase | Tool | Role on The Forge |
|---|---|---|
| Model Registry | Harbor OCI | Versioned storage for quantized weights and containers. Same registry, same GitOps promotion pipeline, same access controls. |
| Training | K8s Jobs · ArgoCD | Training runs as GitOps-managed Kubernetes Jobs — declared, reproducible. Constitutional wrapper enforces parameter budget, detects loss anomalies before the backward pass. |
| Local Inference | Ollama · MLX Edge | On-device via M4 Pro. 32–70B quantized models in the unified memory budget. No cloud dependency. Zero-copy tensor ops via Neural Engine. |
| Model Metrics | Prometheus · Grafana Surveillance | Token throughput, latency p95/p99, error rates, governance evaluation time. Same stack as infrastructure metrics — worker-05. |
| Drift Monitoring | Loki + rules Reconnaissance | Open-ended investigation of output degradation. Composition tracer detects session-level behavioral drift before it reaches the output boundary. |
| Governance Gate | ClawLaw Constitutional | Pre-commit evaluation of every proposed action. Policy managed on worker-02. Enforcement on M4 Pro. Allow · Deny · Escalate · Record. |
| Evidence Chain | MinIO · worker-04 Audit | Full provenance from prompt to inference action to execution to outcome. The MLOps experiment record and the governance audit record are the same object. |
Nine criteria. Three hardware tiers. The individually-approved actions that collectively constitute scope creep — the composition problem — is the central argument. The cluster makes it observable.
Kyverno enforces structural rules at admission: does this resource conform to declared constraints? It does not evaluate intent. It does not track composition across a session. It does not issue escalations to human review.
ClawLaw addresses what admission control cannot. The Forge is where that claim is demonstrated under operational conditions, not stated as a thesis.
Admission control, resource validation, static structural rules. Deterministic decisions on deterministic inputs at the Kubernetes API boundary.
Reason about intent. Track composition across a session. Issue probabilistic risk scores. Route to human escalation as a policy outcome. Trace provenance from prompt to execution.
Propose → Evaluate → Allow / Deny / Escalate → Record. Intent-aware. Composition-tracking. Evidence-generating. Human-in-the-loop when the action warrants it.
The Forge is hardware bring-up and inference benchmarks — the foundation. Darwin investigates recursive learning. OpenClaw builds governed automation end to end. Prometheus is where the lab becomes MLOps — drift detection, latency SLOs, inference reliability as continuous operational discipline.
Enter the lab →Operational labs on the Apple Silicon home-lab itself — hardware bring-up, inference benchmarks, model installs, and the cluster substrate the other series depend on.
Recursive learning on local models. Can a governed agent improve by iterating on its own outputs without violating its constitutional boundaries? Behavioral consistency, self-critique loops, composition drift, recursive ceiling.
End-to-end governed automation on Apple Silicon. ClawLaw install, boundary enforcement, escalation flow, composition detection, audit trail integrity, multi-agent contention.
Model drift detection and inference reliability as continuous MLOps discipline. Token throughput benchmarks, latency SLO definition, drift classification, alert routing from model metrics to on-call.
You cannot govern what you cannot observe. You cannot observe what you don't record.
The governance layer adds measurable but acceptable latency to the Claude-to-filesystem path on M4 Pro.
200 sequential file-write actions through ClawLaw governance. Measured wall-clock time with and without governance layer across reads, writes, and shell executions in a typical development session.
Median: 3.2ms. P95: 8.1ms. P99: 14.7ms. Zero false positives. Two correct denials on boundary-probe patterns.
Governance overhead is invisible in practice. The two denials caught a path traversal and a hosts file modification — both genuine boundary violations. Composition tracer added 0.4ms average. Fail-closed default triggered once on a malformed action and blocked it correctly.
Extend to 8-hour session benchmark. Add governance latency to Prometheus MLOps metrics. Test governance proxy on worker-02 with enforcement on M4 Pro across the distribution channel.
The 70B model at Q4_K_M should fit in unified memory with acceptable generation speed.
llama3:70b-instruct-q4_K_M via Ollama. Monitored memory pressure via Activity Monitor, measured tok/s on a 500-token generation task.
Loaded. Peak memory: 22.1GB. Generation: 4.2 tok/s. Swap: 0. Memory pressure: yellow but stable. Governance latency unchanged at 3.2ms median.
4.2 tok/s is too slow for interactive work. The 8B model at 38 tok/s remains the daily driver. The 70B is the governance test case: governance latency is negligible relative to generation time. Tier III with 192GB changes this equation — 70B at 38 tok/s with 150GB headroom.
Run same benchmark on Tier III Mac Studio. Add tok/s to Prometheus MLOps metrics stack. Compare governance overhead across model sizes.
Six Intel Mac Minis running K3s can maintain 30 days of continuous uptime as a governance control plane.
Monitored cluster health over 30 consecutive days. Tracked node availability, pod restarts, etcd leader elections, certificate rotation. No manual intervention.
Uptime: 100% all 6 nodes. Pod restarts: 2 (OOMKilled, misconfigured monitoring container). Etcd leader elections: 0 unexpected. Certificate rotation: auto day 22.
Remarkably stable. Averages 8W idle per node — the entire cluster draws less than a gaming PC. Main risk is thermal: office exceeded 28C twice, fan speeds increased, no throttling. Dedicated telemetry node (worker-05) isolation validated — observability remained stable throughout.
Add thermal alerting on worker-05. Replace OOMKilled pod. Define thermal SLO. Validate worker-02 ClawLaw proxy deployment stability over same 30-day window.
Apple's position in AI didn't appear with the M-series chips. It's the result of a forty-year arc. NeXT built the server DNA. The Mac Mini became the accidental infrastructure node. Apple Silicon converged CPU, GPU, and Neural Engine onto one die. The Forge is where that arc lands.
CERN. Mach kernel. Unix foundations Apple still runs on.
NeXT powered Tim Berners-Lee's first web server at CERN in 1990. NeXTSTEP's Mach microkernel and BSD elements became macOS. Apple acquired NeXT for $427M in 1996 — bringing Unix server DNA that ships in every Mac today.
The Mach kernel in every Intel Mac Mini running The Forge traces directly to a NeXT workstation in Geneva in 1990.
1U rackmount. The enterprise play Apple abandoned.
Xserve ran in university clusters, government agencies, and production data centers. Discontinued in 2011, but the institutional memory of Apple hardware in server environments evolved into Mac Mini server configurations. The Forge carries that argument forward in a different form factor.
Xserve proved Apple hardware could run in production. The Mac Mini cluster carries that institutional argument forward.
Macminicolo, 2005. The pattern that predates The Forge by 20 years.
Macminicolo launched in 2005, racking Mac Minis in a data center. The community discovered what Apple didn't advertise: the thermal envelope made it viable as always-on infrastructure. The Forge is that pattern taken seriously — real multi-node topology, specialized node roles, operational discipline.
The Mac Mini's power envelope was never designed for infrastructure. The infrastructure community adopted it anyway.
Unix-on-Intel. The transition that made cross-platform tooling seamless.
The Intel transition made the Mac a genuine workstation for Linux-targeting work. Docker, Kubernetes tooling, the full Linux ecosystem became first-class macOS citizens. The six Intel Mac Minis running K3s in The Forge are the direct product of this era — affordable, well-understood x86 Linux nodes with Mac Mini's thermal advantages.
The Darwin heritage is in the practitioner, not the nodes. The nodes run Linux. The cluster is governed by a system built on Apple Silicon.
CPU · GPU · Neural Engine · unified memory. One die.
Apple Silicon eliminated the memory bandwidth bottleneck between CPU and GPU. For inference, this is architecturally significant: a 70B quantized model fits in the unified memory budget without memory copies between compute domains. The M4 Pro runs the ClawLaw enforcement gate alongside local inference — both on the same memory pool.
The Neural Engine is a purpose-built inference accelerator on the same die as the CPU. This is why governed edge inference is possible without cloud dependency.