Silicon Intelligence · The Forge

The Workbench.

Apple Silicon, Kubernetes, and governed inference — in one room.

A room-scale platform for DevOps, MLOps, and governed edge AI — six Intel Mac Minis running K3s, an M4 Pro as the enforcement boundary, and a Mac Studio as the inference substrate. The hardware is the argument. The governance architecture is the thesis.

The Discipline Arc

DevOps gave us the discipline.
MLOps extends it to models.

The Forge runs the full progression. GitOps, observability, and CI/CD govern the platform. The same tools — extended — govern the model lifecycle. A third layer governs inference: intent, composition, evidence. That third layer is where ClawLaw lives.

01 · DevOps

Ship reliably.

Declarative infrastructure, GitOps operators, continuous observability, CI/CD. The control-plane pattern: govern complex systems through declared intent and continuous reconciliation.

ArgoCDPrometheusKyvernoGitHub Actions
02 · MLOps

Same tools.
Different artifact class.

Model registries, training job orchestration, inference serving, drift detection. The same GitOps operators that deploy services deploy models. Same observability stack, pointed at model health.

Model registryDrift detectionInference servingExperiment tracking
03 · Governed Inference

What both leave
unaddressed.

DevOps and MLOps govern structure. Neither governs intent. ClawLaw addresses the gap: intent evaluation, composition tracking, evidence chain, human escalation as a first-class policy outcome.

ClawLawEvidence chainComposition tracingALLOW · DENY · ESCALATE
The full pipeline — git commit to evidence record
Source
Git commit
GitHub
CI
Build · lint · test
GitHub Actions
Artifact
Image / model
Harbor · OCI
GitOps
ArgoCD sync
Drift → reconcile
Platform
K3s deploy
Kyverno gate
Governance
ClawLaw gate
Allow · Deny · Escalate
Inference
Ollama · MLX
M4 Pro · ANE
Observability
Metrics · Logs · Traces
Prometheus · Loki
Evidence
Provenance record
Audit · MinIO

Governance gates highlighted amber · every inference action traverses the full chain

The Infrastructure Substrate

Three tiers. One governed platform.

Tier I is the MLOps control plane. Tier II is the governed execution boundary. Tier III is the high-capacity inference substrate. Each tier has a distinct operational role. The boundaries between them are deliberate. Click any node to inspect it.

Tier I · The Forge · Platform Layer · K3s v1.28.4 MLOps control plane · 6 nodes Intel Mac Mini
k3s-ctrl-01
Control plane
etcdAPI server
k3s-worker-01
Workloads
HarborVault
k3s-worker-02
Agent ingress
NginxClawLaw proxy
k3s-worker-03
Delivery pipeline
ArgoCDGH Actions
k3s-worker-04
Artifact store
PostgreSQLMinIO
k3s-worker-05
Telemetry · audit
PrometheusLoki
k3s-worker-06
Apple Silicon inference
Proposed
PROPOSEDOlder Mac Studio joining as K3s worker. Role: bring inference inside the cluster as a schedulable workload, completing the MLOps pipeline end to end. Introduces ARM64 node to an x86 cluster — real heterogeneous scheduling.
Control plane

k3s-ctrl-01

Hardware
Intel i5 · 16GB RAM
OS
Ubuntu 22.04 LTS
Storage
256GB SSD
Network
1GbE · .101
Services
K3s API serveretcdkube-schedulercontroller-mgr

Single control node in the current configuration. Three-node HA control plane is a Phase 2 milestone. The single-node control plane is an accepted risk at Phase 1 — the cluster's primary role is platform development and operational learning, not production SLA enforcement.

Worker · Workloads

k3s-worker-01

Hardware
Intel i5 · 16GB RAM
OS
Ubuntu 22.04 LTS
Storage
512GB SSD
Network
1GbE · .111
Services
Harbor registryVaultPrimary workloads

Harbor is the center of gravity for the MLOps lifecycle — every model artifact, container image, and versioned weight file is promoted through this node. Harbor stores OCI artifacts, which means model weights and container images share the same registry, promotion pipeline, and access control model.

Worker · Agent ingress

k3s-worker-02

Hardware
Intel i5 · 16GB RAM
OS
Ubuntu 22.04 LTS
Storage
256GB SSD
Network
1GbE · .112
Services
Nginx ingressClawLaw proxy

This node carries the policy management component of the governance architecture. The ClawLaw proxy manages policy rules, risk thresholds, escalation criteria, and routes enforcement decisions to the M4 Pro boundary. Policy management lives here — observable via GitOps, versioned in Harbor, monitored by the telemetry node. Policy enforcement lives on the M4 Pro.

Worker · Delivery pipeline

k3s-worker-03

Hardware
Intel i5 · 16GB RAM
OS
Ubuntu 22.04 LTS
Storage
256GB SSD
Network
1GbE · .113
Services
ArgoCDGitHub Actions runnerKyverno

The node that translates source control events into cluster state changes. GitHub Actions runs CI pipelines here. ArgoCD detects drift between the Git-declared state and live cluster state and reconciles. Kyverno validates manifests before admission. This node makes GitOps operational, not theoretical.

Worker · Artifact store

k3s-worker-04

Hardware
Intel i5 · 16GB RAM
OS
Ubuntu 22.04 LTS
Storage
1TB SSD
Network
1GbE · .114
Services
PostgreSQLMinIO

Persistent storage for structured data and unstructured artifacts: training datasets, evaluation results, experiment records, and the governance evidence records that close the audit loop. MinIO provides S3-compatible object storage, which means Velero can use it as a cluster backup target alongside its role in the MLOps artifact pipeline.

Worker · Telemetry · audit

k3s-worker-05

Hardware
Intel i5 · 16GB RAM
OS
Ubuntu 22.04 LTS
Storage
512GB SSD (metrics)
Network
1GbE · .115
Services
PrometheusGrafanaLokiAlertmanagerTempo

Dedicated observability node. Isolation is deliberate: when any other node is misbehaving, the telemetry node must remain functional and observable. Mixing observability with general workloads defeats its purpose under pressure. This node captures both infrastructure metrics and inference metrics — the ISR model applied to the same stack.

Worker · Proposed · Apple Silicon inference

k3s-worker-06

Hardware
Mac Studio (older) · ARM64
Unified memory
24GB+
Status
Proposed · not yet provisioned
Role
ARM64 inference endpoint
Planned services
OllamaMLXK3s agent

Without this node, inference is never a Kubernetes workload. The MLOps pipeline cannot close inside the cluster. With this node, training → registry → inference deployment → metrics → governance → evidence becomes a single operational loop managed by GitOps. Introduces heterogeneous scheduling — ARM64 inference workloads routed via node affinity and taints alongside x86 platform workloads.

Tier II · Governed Execution Boundary · ClawLaw v0.3.2 Pre-commit gate · fail-closed
Mac Mini M4 Pro · 64GB unified memory

Every proposed action passes through the ClawLaw pre-commit gate before execution. The agent cannot modify or bypass the layer governing it.

ClawLaw enforcementOllamaMLXOpenClaw
ALLOW  ·  DENY  ·  ESCALATE
Median gate latency: 3.2ms · P99: 14.7ms
Why enforcement lives here

Physical separation preserves Agency Paradox criterion 01: the governance layer is structurally independent of the workloads it governs. An agent running inside K3s on Tier I operates on a different machine from the enforcement process evaluating its actions. The cluster cannot redeploy or modify the enforcement boundary.

Tier III · Inference Substrate Not a K3s node · behind ClawLaw gate
Mac Studio · 192GB unified memory

High-capacity inference substrate. Not a cluster workload. Model serving runs behind the ClawLaw gate — inference requests arrive from Tier II on ALLOW, execute here, return through the evidence record.

OllamaMLXNeural Engine
70B model footprint
~40GB used
150GB headroom
Generation speed
~38 tok/s
70B interactive

The M5 Ultra is the natural next evolutionary step for this tier — same architecture, substantially increased capacity. The governance layer does not change when the hardware upgrades.

The Governance Split

The central architectural decision.

Moving inference inside K3s enables the full MLOps story but creates a governance question: if governance runs inside the same cluster as the agent it governs, separation of governance weakens. The resolution is a split — not a compromise.

Policy management · inside the cluster · worker-02
Governance as a platform service.

Rules, risk thresholds, and version history are cluster services — GitOps-managed, ArgoCD-deployed, observable through the telemetry node. The Git commit log is the audit trail.

policy

push
Policy enforcement · on the M4 Pro · Tier II boundary
Enforcement at the physical boundary.

The pre-commit gate runs on hardware the cluster cannot reach as a workload. The cluster pushes policy; the boundary applies it. An agent inside K3s cannot modify what evaluates it.

Any governed AI system at scale needs this split: policy as a platform service, enforcement at a structural boundary, a verified channel between them. The full reasoning — options, trade-offs, and consequences — is the position paper.

Read ADR-0006 →
MLOps Architecture

The model lifecycle, end to end.

The ISR intelligence model applies to both infrastructure and inference. Reconnaissance/Logs, Surveillance/Metrics, Intelligence/Traces — each signal answers a different operational question. Conflating them produces dashboards that look impressive and tell you nothing under pressure.

Reconnaissance
Logs
Loki
Open-ended investigation. Unknown-territory queries. What happened in that training run. What the agent did in session.
Surveillance
Metrics
Prometheus · Grafana
Continuous monitoring of known indicators. Token throughput, latency p95/p99, governance evaluation time, drift scores.
Intelligence
Traces
Tempo · OpenTelemetry
Reconstruct what happened. Distributed request traces from prompt to inference to governed action to output.
PhaseToolRole on The Forge
Model RegistryHarbor OCIVersioned storage for quantized weights and containers. Same registry, same GitOps promotion pipeline, same access controls.
TrainingK8s Jobs · ArgoCDTraining runs as GitOps-managed Kubernetes Jobs — declared, reproducible. Constitutional wrapper enforces parameter budget, detects loss anomalies before the backward pass.
Local InferenceOllama · MLX EdgeOn-device via M4 Pro. 32–70B quantized models in the unified memory budget. No cloud dependency. Zero-copy tensor ops via Neural Engine.
Model MetricsPrometheus · Grafana SurveillanceToken throughput, latency p95/p99, error rates, governance evaluation time. Same stack as infrastructure metrics — worker-05.
Drift MonitoringLoki + rules ReconnaissanceOpen-ended investigation of output degradation. Composition tracer detects session-level behavioral drift before it reaches the output boundary.
Governance GateClawLaw ConstitutionalPre-commit evaluation of every proposed action. Policy managed on worker-02. Enforcement on M4 Pro. Allow · Deny · Escalate · Record.
Evidence ChainMinIO · worker-04 AuditFull provenance from prompt to inference action to execution to outcome. The MLOps experiment record and the governance audit record are the same object.
The Governed Lab

The Agency Paradox, implemented.

Nine criteria. Three hardware tiers. The individually-approved actions that collectively constitute scope creep — the composition problem — is the central argument. The cluster makes it observable.

01
Separation of governance
The governance layer is structurally independent. The enforcement boundary runs on hardware the governed agent cannot reach.
02
Determinism
Same action, same policy, same session state → same verdict. Governance cannot be probabilistic where the agent is probabilistic.
03
Fail-closed default
When the gate cannot evaluate, the default is denial. Uncertainty resolves to restriction, not permission.
04
Compositional evidence
Each action is recorded with full context. Evidence accumulates into a provenance chain that reconstructs the session.
05
Composition-aware evaluation
Actions evaluated in sequence context, not isolation. The composition problem — individually valid actions constituting collective scope creep — is detectable.
06
Substrate independence
Governance operates regardless of which model produces the action or which hardware executes inference.
07
Principal observability
The human principal can inspect the full governance record. The system does not obscure what it decided or why.
08
Knowledge substrate currency
Local inference means the knowledge substrate is updated and versioned without cloud dependency or third-party availability constraints.
09
Auditability across the model lifecycle
The evidence chain spans training artifact through inference call to governed action. MLOps experiment tracking and governance audit are the same record.
The governance distinction

Where Kyverno ends.

Kyverno enforces structural rules at admission: does this resource conform to declared constraints? It does not evaluate intent. It does not track composition across a session. It does not issue escalations to human review.

ClawLaw addresses what admission control cannot. The Forge is where that claim is demonstrated under operational conditions, not stated as a thesis.

What Kyverno does well

Admission control, resource validation, static structural rules. Deterministic decisions on deterministic inputs at the Kubernetes API boundary.

What Kyverno cannot do

Reason about intent. Track composition across a session. Issue probabilistic risk scores. Route to human escalation as a policy outcome. Trace provenance from prompt to execution.

Where ClawLaw begins

Propose → Evaluate → Allow / Deny / Escalate → Record. Intent-aware. Composition-tracking. Evidence-generating. Human-in-the-loop when the action warrants it.

Lab Series · The Forge · Darwin · OpenClaw · Prometheus

Four series. One governed lab.

The Forge is hardware bring-up and inference benchmarks — the foundation. Darwin investigates recursive learning. OpenClaw builds governed automation end to end. Prometheus is where the lab becomes MLOps — drift detection, latency SLOs, inference reliability as continuous operational discipline.

Enter the lab →
F
Series F

The Forge

Operational labs on the Apple Silicon home-lab itself — hardware bring-up, inference benchmarks, model installs, and the cluster substrate the other series depend on.

2
Labs
001–002
Range
A
Series A

Darwin

Recursive learning on local models. Can a governed agent improve by iterating on its own outputs without violating its constitutional boundaries? Behavioral consistency, self-critique loops, composition drift, recursive ceiling.

4
Labs
022–025
Range
B
Series B

OpenClaw

End-to-end governed automation on Apple Silicon. ClawLaw install, boundary enforcement, escalation flow, composition detection, audit trail integrity, multi-agent contention.

8
Labs
003–032
Range
C
Series C · Planned

Prometheus

Model drift detection and inference reliability as continuous MLOps discipline. Token throughput benchmarks, latency SLO definition, drift classification, alert routing from model metrics to on-call.

6
Planned
Range
Forge Series · Hardware & Inference Labs
L-001
Inference Benchmark: Apple Silicon vs Discrete GPU
Mac Mini M4 Pro and an RTX 3060 produce different tok/s numbers because they are measuring different things — not because either machine is "better".
Done
L-002
Gemma 4 on Apple Silicon — Installation and Configuration
A Mac Mini M4 Pro with 64GB unified memory can run the full Gemma 4 model family locally, expose Ollama and MLX as comparable backends, and feed Langfuse with reproducible traces — in a single afternoon.
Active
Darwin Series · Labs 022–025
L-022
Baseline identity
A governed agent with a fixed identity prompt produces measurably consistent outputs across 100 runs.
Done
L-023
Self-critique loop
An agent evaluating its own prior output improves quality scores without violating governance boundaries.
Active
L-024
Composition drift
Session composition state detects quality degradation before it reaches the output boundary.
Planned
L-025
Recursive ceiling
Iterative self-improvement plateaus at a measurable point determined by model capacity and governance constraints.
Planned
OpenClaw Series · Labs 026–032
L-003
OpenClaw Across the Provider Matrix
A single agentic CLI can drive Claude, GPT-4, Gemini, and a local model interchangeably without code changes — and the differences in tool-use behaviour are reproducible enough to govern.
Active
L-026
ClawLaw install
Constitutional governance can be installed on a clean Mac Mini M4 Pro in under 30 minutes.
Done
L-027
Boundary enforcement
The filesystem boundary constraint correctly denies all writes outside the mutable scope.
Done
L-028
Escalation flow
Unknown network endpoints trigger ESCALATE verdicts that pause execution until principal review.
Active
L-029
Composition detection
The composition constraint detects boundary probe patterns after 3 sequential denials.
Planned
L-030
Audit trail integrity
Replaying the audit log against the same initial state produces identical final state.
Planned
L-031
Multi-agent contention
Two concurrent governed agents sharing the same governance layer produce no race conditions.
Planned
L-032
Production benchmark
A governed 8-hour development session completes all tasks with zero governance-layer failures.
Planned
Field Reports

Structured lab notes. Dateable. Reproducible.

You cannot govern what you cannot observe. You cannot observe what you don't record.

2026-03-08
Mac Mini M4 Pro · 64GB Unified Memory
ClawLaw pre-commit gate: latency under real workload
Hypothesis

The governance layer adds measurable but acceptable latency to the Claude-to-filesystem path on M4 Pro.

Method

200 sequential file-write actions through ClawLaw governance. Measured wall-clock time with and without governance layer across reads, writes, and shell executions in a typical development session.

Results

Median: 3.2ms. P95: 8.1ms. P99: 14.7ms. Zero false positives. Two correct denials on boundary-probe patterns.

Observations

Governance overhead is invisible in practice. The two denials caught a path traversal and a hosts file modification — both genuine boundary violations. Composition tracer added 0.4ms average. Fail-closed default triggered once on a malformed action and blocked it correctly.

Next steps

Extend to 8-hour session benchmark. Add governance latency to Prometheus MLOps metrics. Test governance proxy on worker-02 with enforcement on M4 Pro across the distribution channel.

2026-02-22
Mac Mini M4 Pro · 64GB Unified Memory
Llama 3 70B Q4_K_M on M4 Pro: does it fit?
Hypothesis

The 70B model at Q4_K_M should fit in unified memory with acceptable generation speed.

Method

llama3:70b-instruct-q4_K_M via Ollama. Monitored memory pressure via Activity Monitor, measured tok/s on a 500-token generation task.

Results

Loaded. Peak memory: 22.1GB. Generation: 4.2 tok/s. Swap: 0. Memory pressure: yellow but stable. Governance latency unchanged at 3.2ms median.

Observations

4.2 tok/s is too slow for interactive work. The 8B model at 38 tok/s remains the daily driver. The 70B is the governance test case: governance latency is negligible relative to generation time. Tier III with 192GB changes this equation — 70B at 38 tok/s with 150GB headroom.

Next steps

Run same benchmark on Tier III Mac Studio. Add tok/s to Prometheus MLOps metrics stack. Compare governance overhead across model sizes.

2026-02-15
6× Mac Mini Intel i5 · K3s v1.28.4
K3s cluster: 30-day stability report
Hypothesis

Six Intel Mac Minis running K3s can maintain 30 days of continuous uptime as a governance control plane.

Method

Monitored cluster health over 30 consecutive days. Tracked node availability, pod restarts, etcd leader elections, certificate rotation. No manual intervention.

Results

Uptime: 100% all 6 nodes. Pod restarts: 2 (OOMKilled, misconfigured monitoring container). Etcd leader elections: 0 unexpected. Certificate rotation: auto day 22.

Observations

Remarkably stable. Averages 8W idle per node — the entire cluster draws less than a gaming PC. Main risk is thermal: office exceeded 28C twice, fan speeds increased, no throttling. Dedicated telemetry node (worker-05) isolation validated — observability remained stable throughout.

Next steps

Add thermal alerting on worker-05. Replace OOMKilled pod. Define thermal SLO. Validate worker-02 ClawLaw proxy deployment stability over same 30-day window.

The Silicon Arc · 1984 → 2026

From NeXT to Neural Engine.

Apple's position in AI didn't appear with the M-series chips. It's the result of a forty-year arc. NeXT built the server DNA. The Mac Mini became the accidental infrastructure node. Apple Silicon converged CPU, GPU, and Neural Engine onto one die. The Forge is where that arc lands.

1985–1997
NeXT
1994–2006
Xserve
2005–2011
Mac Mini
2006–2020
Intel
2020 →
M-series

The enterprise DNA

CERN. Mach kernel. Unix foundations Apple still runs on.

NeXT powered Tim Berners-Lee's first web server at CERN in 1990. NeXTSTEP's Mach microkernel and BSD elements became macOS. Apple acquired NeXT for $427M in 1996 — bringing Unix server DNA that ships in every Mac today.

Processor
Motorola 68030/68040
Key customers
CERN · NSA · Swiss Bank
Acquired
Dec 1996 · $427M
Relevance
Mach kernel in every Forge node
The Mach kernel in every Intel Mac Mini running The Forge traces directly to a NeXT workstation in Geneva in 1990.

The server ambition

1U rackmount. The enterprise play Apple abandoned.

Xserve ran in university clusters, government agencies, and production data centers. Discontinued in 2011, but the institutional memory of Apple hardware in server environments evolved into Mac Mini server configurations. The Forge carries that argument forward in a different form factor.

Form factor
1U rackmount
Processor
G4 / G5 / Intel Xeon
Discontinued
January 31, 2011
Legacy
Mac Mini server configs followed
Xserve proved Apple hardware could run in production. The Mac Mini cluster carries that institutional argument forward.

The accidental server

Macminicolo, 2005. The pattern that predates The Forge by 20 years.

Macminicolo launched in 2005, racking Mac Minis in a data center. The community discovered what Apple didn't advertise: the thermal envelope made it viable as always-on infrastructure. The Forge is that pattern taken seriously — real multi-node topology, specialized node roles, operational discipline.

Idle power
~8–11W per node
Precedent
Macminicolo, 2005
Forge cluster
6 nodes · ~$250 each used
Total cluster draw
Less than a gaming PC
The Mac Mini's power envelope was never designed for infrastructure. The infrastructure community adopted it anyway.

The platform years

Unix-on-Intel. The transition that made cross-platform tooling seamless.

The Intel transition made the Mac a genuine workstation for Linux-targeting work. Docker, Kubernetes tooling, the full Linux ecosystem became first-class macOS citizens. The six Intel Mac Minis running K3s in The Forge are the direct product of this era — affordable, well-understood x86 Linux nodes with Mac Mini's thermal advantages.

Forge nodes
Core i5 · 16GB
OS
Ubuntu 22.04 LTS
Cluster
K3s · 1 control · 5 workers
Note
Darwin heritage in the practitioner
The Darwin heritage is in the practitioner, not the nodes. The nodes run Linux. The cluster is governed by a system built on Apple Silicon.

The convergence point

CPU · GPU · Neural Engine · unified memory. One die.

Apple Silicon eliminated the memory bandwidth bottleneck between CPU and GPU. For inference, this is architecturally significant: a 70B quantized model fits in the unified memory budget without memory copies between compute domains. The M4 Pro runs the ClawLaw enforcement gate alongside local inference — both on the same memory pool.

Tier II node
M4 Pro · 64GB
Inference
Ollama · MLX · ANE
Governance
ClawLaw enforcement
Tier III node
Mac Studio · 192GB
The Neural Engine is a purpose-built inference accelerator on the same die as the CPU. This is why governed edge inference is possible without cloud dependency.
11W
Idle power per node — 24/7 for less than a data center PDU
~$250
Per node used — six-node cluster under $1,500
silent
No active cooling — viable in a living room, not a server room
Deeper into the platform
Build your own
Across the site