Silicon — Apple Silicon AI Lab

The Discipline Arc

DevOps gave us the discipline.
MLOps extends it to models.

The Forge runs the full progression. GitOps, observability, and CI/CD govern the platform. The same tools — extended — govern the model lifecycle. A third layer governs inference: intent, composition, evidence. That third layer is where ClawLaw lives.

01 · DevOps

Ship reliably.

Declarative infrastructure, GitOps operators, continuous observability, CI/CD. The control-plane pattern: govern complex systems through declared intent and continuous reconciliation.

ArgoCDPrometheusKyvernoGitHub Actions

→

02 · MLOps

Same tools.
Different artifact class.

Model registries, training job orchestration, inference serving, drift detection. The same GitOps operators that deploy services deploy models. Same observability stack, pointed at model health.

Model registryDrift detectionInference servingExperiment tracking

→

03 · Governed Inference

What both leave
unaddressed.

DevOps and MLOps govern structure. Neither governs intent. ClawLaw addresses the gap: intent evaluation, composition tracking, evidence chain, human escalation as a first-class policy outcome.

ClawLawEvidence chainComposition tracingALLOW · DENY · ESCALATE

The full pipeline — git commit to evidence record

Source

Git commit

GitHub

→

CI

Build · lint · test

GitHub Actions

→

Artifact

Image / model

Harbor · OCI

→

GitOps

ArgoCD sync

Drift → reconcile

→

Platform

K3s deploy

Kyverno gate

→

Governance

ClawLaw gate

Allow · Deny · Escalate

→

Inference

Ollama · MLX

M4 Pro · ANE

→

Observability

Metrics · Logs · Traces

Prometheus · Loki

→

Evidence

Provenance record

Audit · MinIO

Governance gates highlighted amber · every inference action traverses the full chain

The Infrastructure

Three tiers. One governed platform.

Tier I is the MLOps control plane. Tier II is the governed execution boundary. Tier III is high-capacity inference. Each tier has a distinct operational role. The boundaries between them are deliberate. Click any node to inspect it.

Tier I · The Forge · Platform Layer · K3s v1.28.4 MLOps control plane · 6 nodes Intel Mac Mini

k3s-ctrl-01

Control plane

etcdAPI server

k3s-worker-01

Workloads

HarborVault

k3s-worker-02

Agent ingress

NginxClawLaw proxy

k3s-worker-03

Delivery pipeline

ArgoCDGH Actions

k3s-worker-04

Artifact store

PostgreSQLMinIO

k3s-worker-05

Telemetry · audit

PrometheusLoki

k3s-worker-06

Apple Silicon inference

Proposed

PROPOSEDOlder Mac Studio joining as K3s worker. Role: bring inference inside the cluster as a schedulable workload, completing the MLOps pipeline end to end. Introduces ARM64 node to an x86 cluster — real heterogeneous scheduling.

Control plane

k3s-ctrl-01

Hardware

Intel i5 · 16GB RAM

OS

Ubuntu 22.04 LTS

Storage

256GB SSD

Network

1GbE · .101

Services

K3s API serveretcdkube-schedulercontroller-mgr

Single control node in the current configuration. Three-node HA control plane is a Phase 2 milestone. The single-node control plane is an accepted risk at Phase 1 — the cluster's primary role is platform development and operational learning, not production SLA enforcement.

Worker · Workloads

k3s-worker-01

Hardware

Intel i5 · 16GB RAM

OS

Ubuntu 22.04 LTS

Storage

512GB SSD

Network

1GbE · .111

Services

Harbor registryVaultPrimary workloads

Harbor is the center of gravity for the MLOps lifecycle — every model artifact, container image, and versioned weight file is promoted through this node. Harbor stores OCI artifacts, which means model weights and container images share the same registry, promotion pipeline, and access control model.

Worker · Agent ingress

k3s-worker-02

Hardware

Intel i5 · 16GB RAM

OS

Ubuntu 22.04 LTS

Storage

256GB SSD

Network

1GbE · .112

Services

Nginx ingressClawLaw proxy

This node carries the policy management component of the governance architecture. The ClawLaw proxy manages policy rules, risk thresholds, escalation criteria, and routes enforcement decisions to the M4 Pro boundary. Policy management lives here — observable via GitOps, versioned in Harbor, monitored by the telemetry node. Policy enforcement lives on the M4 Pro.

Worker · Delivery pipeline

k3s-worker-03

Hardware

Intel i5 · 16GB RAM

OS

Ubuntu 22.04 LTS

Storage

256GB SSD

Network

1GbE · .113

Services

ArgoCDGitHub Actions runnerKyverno

The node that translates source control events into cluster state changes. GitHub Actions runs CI pipelines here. ArgoCD detects drift between the Git-declared state and live cluster state and reconciles. Kyverno validates manifests before admission. This node makes GitOps operational.

Worker · Artifact store

k3s-worker-04

Hardware

Intel i5 · 16GB RAM

OS

Ubuntu 22.04 LTS

Storage

1TB SSD

Network

1GbE · .114

Services

PostgreSQLMinIO

Persistent storage for structured data and unstructured artifacts: training datasets, evaluation results, experiment records, and the governance evidence records that close the audit loop. MinIO provides S3-compatible object storage, which means Velero can use it as a cluster backup target alongside its role in the MLOps artifact pipeline.

Worker · Telemetry · audit

k3s-worker-05

Hardware

Intel i5 · 16GB RAM

OS

Ubuntu 22.04 LTS

Storage

512GB SSD (metrics)

Network

1GbE · .115

Services

PrometheusGrafanaLokiAlertmanagerTempo

Dedicated observability node. Isolation is deliberate: when any other node is misbehaving, the telemetry node must remain functional and observable. Mixing observability with general workloads defeats its purpose under pressure. This node captures both infrastructure metrics and inference metrics — the ISR model applied to the same stack.

Worker · Proposed · Apple Silicon inference

k3s-worker-06

Hardware

Mac Studio (older) · ARM64

Unified memory

24GB+

Status

Proposed · not yet provisioned

Role

ARM64 inference endpoint

Planned services

OllamaMLXK3s agent

Without this node, inference is never a Kubernetes workload. The MLOps pipeline cannot close inside the cluster. With this node, training → registry → inference deployment → metrics → governance → evidence becomes a single operational loop managed by GitOps. Introduces heterogeneous scheduling — ARM64 inference workloads routed via node affinity and taints alongside x86 platform workloads.

Tier II · Governed Execution Boundary · ClawLaw v0.3.2 Pre-commit gate · fail-closed

Mac Mini M4 Pro · 64GB unified memory

Every proposed action passes through the ClawLaw pre-commit gate before execution. The agent cannot modify or bypass the layer governing it.

ClawLaw enforcementOllamaMLXOpenClaw

ALLOW · DENY · ESCALATE
Median gate latency: 3.2ms · P99: 14.7ms

Why enforcement lives here

Physical separation preserves Agency Paradox criterion 01: the governance layer is structurally independent of the workloads it governs. An agent running inside K3s on Tier I operates on a different machine from the enforcement process evaluating its actions. The cluster cannot redeploy or modify the enforcement boundary.

Tier III · Inference Substrate Not a K3s node · behind ClawLaw gate

Mac Studio · 192GB unified memory

High-capacity inference, deliberately outside the cluster. Model serving runs behind the ClawLaw gate — inference requests arrive from Tier II on ALLOW, execute here, return through the evidence record.

OllamaMLXNeural Engine

70B model footprint

~40GB used

150GB headroom

Generation speed

~38 tok/s

70B interactive

The M5 Ultra is the natural next evolutionary step for this tier — same architecture, substantially increased capacity. The governance layer does not change when the hardware upgrades.

The Governance Split

The central architectural decision.

Moving inference inside K3s enables the full MLOps story but creates a governance question: if governance runs inside the same cluster as the agent it governs, separation of governance weakens. The resolution is a split.

Policy management · inside the cluster · worker-02

Governance as a platform service.

Rules, risk thresholds, and version history are cluster services — GitOps-managed, ArgoCD-deployed, observable through the telemetry node. The Git commit log is the audit trail.

policy
↓
push

Policy enforcement · on the M4 Pro · Tier II boundary

Enforcement at the physical boundary.

The pre-commit gate runs on hardware the cluster cannot reach as a workload. The cluster pushes policy; the boundary applies it. An agent inside K3s cannot modify what evaluates it.

Any governed AI system at scale needs this split: policy as a platform service, enforcement at a structural boundary, a verified channel between them. The full reasoning — options, trade-offs, and consequences — is the position paper.

Read ADR-0006 →

MLOps Architecture

The model lifecycle, end to end.

The ISR intelligence model applies to both infrastructure and inference. Reconnaissance/Logs, Surveillance/Metrics, Intelligence/Traces — each signal answers a different operational question. Conflating them produces dashboards that look impressive and tell you nothing under pressure.

Reconnaissance

Logs

Loki

Open-ended investigation. Unknown-territory queries. What happened in that training run. What the agent did in session.

Surveillance

Metrics

Prometheus · Grafana

Continuous monitoring of known indicators. Token throughput, latency p95/p99, governance evaluation time, drift scores.

Intelligence

Traces

Tempo · OpenTelemetry

Reconstruct what happened. Distributed request traces from prompt to inference to governed action to output.

Phase	Tool	Role on The Forge
Model Registry	Harbor OCI	Versioned storage for quantized weights and containers. Same registry, same GitOps promotion pipeline, same access controls.
Training	K8s Jobs · ArgoCD	Training runs as GitOps-managed Kubernetes Jobs — declared, reproducible. Constitutional wrapper enforces parameter budget, detects loss anomalies before the backward pass.
Local Inference	Ollama · MLX Edge	On-device via M4 Pro. 32–70B quantized models in the unified memory budget. No cloud dependency. Zero-copy tensor ops via Neural Engine.
Model Metrics	Prometheus · Grafana Surveillance	Token throughput, latency p95/p99, error rates, governance evaluation time. Same stack as infrastructure metrics — worker-05.
Drift Monitoring	Loki + rules Reconnaissance	Open-ended investigation of output degradation. Composition tracer detects session-level behavioral drift before it reaches the output boundary.
Governance Gate	ClawLaw Constitutional	Pre-commit evaluation of every proposed action. Policy managed on worker-02. Enforcement on M4 Pro. Allow · Deny · Escalate · Record.
Evidence Chain	MinIO · worker-04 Audit	Full provenance from prompt to inference action to execution to outcome. The MLOps experiment record and the governance audit record are the same object.

The Governed Lab

The Agency Paradox, implemented.

Nine criteria. Three hardware tiers. The individually-approved actions that collectively constitute scope creep — the composition problem — is the central argument. The cluster makes it observable.

01

Separation of governance

The governance layer is structurally independent. The enforcement boundary runs on hardware the governed agent cannot reach.

02

Determinism

Same action, same policy, same session state → same verdict. Governance cannot be probabilistic where the agent is probabilistic.

03

Fail-closed default

When the gate cannot evaluate, the default is denial. Uncertainty resolves to restriction, not permission.

04

Compositional evidence

Each action is recorded with full context. Evidence accumulates into a provenance chain that reconstructs the session.

05

Composition-aware evaluation

Actions evaluated in sequence context, not isolation. The composition problem — individually valid actions constituting collective scope creep — is detectable.

06

Substrate independence

Governance operates regardless of which model produces the action or which hardware executes inference.

07

Principal observability

The human principal can inspect the full governance record. The system does not obscure what it decided or why.

08

Knowledge substrate currency

Local inference means the knowledge substrate is updated and versioned without cloud dependency or third-party availability constraints.

09

Auditability across the model lifecycle

The evidence chain spans training artifact through inference call to governed action. MLOps experiment tracking and governance audit are the same record.

The governance distinction

Where Kyverno ends.

Kyverno enforces structural rules at admission: does this resource conform to declared constraints? It does not evaluate intent. It does not track composition across a session. It does not issue escalations to human review.

ClawLaw addresses what admission control cannot. The Forge is where that claim is demonstrated under operational conditions.

Read ADR-0006 → The Forge architecture

What Kyverno does well

Admission control, resource validation, static structural rules. Deterministic decisions on deterministic inputs at the Kubernetes API boundary.

What Kyverno cannot do

Reason about intent. Track composition across a session. Issue probabilistic risk scores. Route to human escalation as a policy outcome. Trace provenance from prompt to execution.

Where ClawLaw begins

Propose → Evaluate → Allow / Deny / Escalate → Record. Intent-aware. Composition-tracking. Evidence-generating. Human-in-the-loop when the action warrants it.

Lab Series · The Forge · Darwin · OpenClaw · Prometheus

Four series. One governed lab.

The Forge is hardware bring-up and inference benchmarks — the foundation. Darwin investigates recursive learning. The OpenClaw series puts an agent runtime under a constitutional gate, end to end. Prometheus is where the lab becomes MLOps — drift detection, latency SLOs, inference reliability as continuous operational discipline.

Enter the lab →

F

Series F

The Forge

Operational labs on the Apple Silicon home-lab itself — hardware bring-up, inference benchmarks, model installs, and the cluster the other series depend on.

2

Labs

001–002

Range

A

Series A

Darwin

Recursive learning on local models. Can a governed agent improve by iterating on its own outputs without violating its constitutional boundaries? Behavioral consistency, self-critique loops, composition drift, recursive ceiling.

4

Labs

022–025

Range

B

Series B

OpenClaw

End-to-end governed automation on Apple Silicon. ClawLaw install, boundary enforcement, escalation flow, composition detection, audit trail integrity, multi-agent contention.

8

Labs

003–032

Range

C

Series C · Planned

Prometheus

Model drift detection and inference reliability as continuous MLOps discipline. Token throughput benchmarks, latency SLO definition, drift classification, alert routing from model metrics to on-call.

6

Planned

—

Range

Field Reports

Structured lab notes — dated, reproducible.

You cannot govern what you cannot observe. You cannot observe what you don't record.

2026-03-08

Mac Mini M4 Pro · 64GB Unified Memory

ClawLaw pre-commit gate: latency under real workload

Hypothesis

The governance layer adds measurable but acceptable latency to the Claude-to-filesystem path on M4 Pro.

Method

200 sequential file-write actions through ClawLaw governance. Measured wall-clock time with and without governance layer across reads, writes, and shell executions in a typical development session.

Results

Median: 3.2ms. P95: 8.1ms. P99: 14.7ms. Zero false positives. Two correct denials on boundary-probe patterns.

Observations

Governance overhead is invisible in practice. The two denials caught a path traversal and a hosts file modification — both genuine boundary violations. Composition tracer added 0.4ms average. Fail-closed default triggered once on a malformed action and blocked it correctly.

Next steps

Extend to 8-hour session benchmark. Add governance latency to Prometheus MLOps metrics. Test governance proxy on worker-02 with enforcement on M4 Pro across the distribution channel.

2026-02-22

Mac Mini M4 Pro · 64GB Unified Memory

Llama 3 70B Q4_K_M on M4 Pro: does it fit?

Hypothesis

The 70B model at Q4_K_M should fit in unified memory with acceptable generation speed.

Method

llama3:70b-instruct-q4_K_M via Ollama. Monitored memory pressure via Activity Monitor, measured tok/s on a 500-token generation task.

Results

Loaded. Peak memory: 22.1GB. Generation: 4.2 tok/s. Swap: 0. Memory pressure: yellow but stable. Governance latency unchanged at 3.2ms median.

Observations

4.2 tok/s is too slow for interactive work. The 8B model at 38 tok/s remains the daily driver. The 70B is the governance test case: governance latency is negligible relative to generation time. Tier III with 192GB changes this equation — 70B at 38 tok/s with 150GB headroom.

Next steps

Run same benchmark on Tier III Mac Studio. Add tok/s to Prometheus MLOps metrics stack. Compare governance overhead across model sizes.

2026-02-15

6× Mac Mini Intel i5 · K3s v1.28.4

K3s cluster: 30-day stability report

Hypothesis

Six Intel Mac Minis running K3s can maintain 30 days of continuous uptime as a governance control plane.

Method

Monitored cluster health over 30 consecutive days. Tracked node availability, pod restarts, etcd leader elections, certificate rotation. No manual intervention.

Results

Uptime: 100% all 6 nodes. Pod restarts: 2 (OOMKilled, misconfigured monitoring container). Etcd leader elections: 0 unexpected. Certificate rotation: auto day 22.

Observations

Remarkably stable. Averages 8W idle per node — the entire cluster draws less than a gaming PC. Main risk is thermal: office exceeded 28C twice, fan speeds increased, no throttling. Dedicated telemetry node (worker-05) isolation validated — observability remained stable throughout.

Next steps

Add thermal alerting on worker-05. Replace OOMKilled pod. Define thermal SLO. Validate worker-02 ClawLaw proxy deployment stability over same 30-day window.

The Silicon Arc · 1985 → 2026

From NeXT to Neural Engine.

Apple's position in AI didn't appear with the M-series chips. It's the result of a forty-year arc. NeXT built the server DNA. The Mac Mini became the accidental infrastructure node. Apple Silicon converged CPU, GPU, and Neural Engine onto one die. The Forge is where that arc lands.

1985–1997

Xserve

2005–2011

Mac Mini

2006–2020

Intel

2020 →

M-series

The enterprise DNA

CERN. Mach kernel. Unix foundations Apple still runs on.

NeXT powered Tim Berners-Lee's first web server at CERN in 1990. NeXTSTEP's Mach microkernel and BSD elements became macOS. Apple acquired NeXT for $427M in 1996 — bringing Unix server DNA that ships in every Mac today.

Processor

Motorola 68030/68040

Key customers

CERN · NSA · Swiss Bank

Acquired

Dec 1996 · $427M

Relevance

Mach kernel in every Forge node

The Mach kernel in every Intel Mac Mini running The Forge traces directly to a NeXT workstation in Geneva in 1990.

The server ambition

1U rackmount. The enterprise play Apple abandoned.

Xserve ran in university clusters, government agencies, and production data centers. Discontinued in 2011, but the institutional memory of Apple hardware in server environments evolved into Mac Mini server configurations. The Forge carries that argument forward in a different form factor.

Form factor

1U rackmount

Processor

G4 / G5 / Intel Xeon

Discontinued

January 31, 2011

Legacy

Mac Mini server configs followed

Xserve proved Apple hardware could run in production. The Mac Mini cluster carries that institutional argument forward.

The accidental server

Macminicolo, 2005. The pattern that predates The Forge by 20 years.

Macminicolo launched in 2005, racking Mac Minis in a data center. The community discovered what Apple didn't advertise: the thermal envelope made it viable as always-on infrastructure. The Forge is that pattern taken seriously — real multi-node topology, specialized node roles, operational discipline.

Idle power

~8–11W per node

Precedent

Macminicolo, 2005

Forge cluster

6 nodes · ~$250 each used

Total cluster draw

Less than a gaming PC

The Mac Mini's power envelope was never designed for infrastructure. The infrastructure community adopted it anyway.

The platform years

Unix-on-Intel. The transition that made cross-platform tooling seamless.

The Intel transition made the Mac a genuine workstation for Linux-targeting work. Docker, Kubernetes tooling, the full Linux ecosystem became first-class macOS citizens. The six Intel Mac Minis running K3s in The Forge are the direct product of this era — affordable, well-understood x86 Linux nodes with Mac Mini's thermal advantages.

Forge nodes

Core i5 · 16GB

OS

Ubuntu 22.04 LTS

Cluster

K3s · 1 control · 5 workers

Note

Darwin heritage in the practitioner

The Darwin heritage is in the practitioner, not the nodes. The nodes run Linux. The cluster is governed by a system built on Apple Silicon.

The convergence point

CPU · GPU · Neural Engine · unified memory. One die.

Apple Silicon eliminated the memory bandwidth bottleneck between CPU and GPU. For inference, this is architecturally significant: a 70B quantized model fits in the unified memory budget without memory copies between compute domains. The M4 Pro runs the ClawLaw enforcement gate alongside local inference — both on the same memory pool.

Tier II node

M4 Pro · 64GB

Inference

Ollama · MLX · ANE

Governance

ClawLaw enforcement

Tier III node

Mac Studio · 192GB

The Neural Engine is a purpose-built inference accelerator on the same die as the CPU. This is why governed edge inference is possible without cloud dependency.

11W

Idle power per node — 24/7 for less than a data center PDU

~$250

Per node used — six-node cluster under $1,500

silent

No active cooling — viable in a living room, not a server room

Deeper into the platform

The Forge →

The full architecture — three tiers, six nodes, one platform.

Node specialization, the governance split, the MLOps lifecycle end to end. The credibility document.

MLOps architecture →

The model lifecycle, governed end to end.