Silicon/MLOps
Silicon · MLOps

MLOps, end to end.

The model lifecycle is a pipeline. The audit record is the experiment record. Once the governance layer is part of the platform, those two sentences are the same sentence.

01 · Why MLOps without governance is half a story

DevOps and MLOps govern structure. Neither governs intent.

DevOps gave us a discipline for shipping software reliably. MLOps applies the same discipline to a different artefact class — models instead of binaries, training jobs instead of builds, drift monitoring instead of error rates. What neither addresses is the moment a model produces an output that proposes an action: "I think we should rm -rf this directory," or "I think we should send this email." Something has to decide whether that proposal becomes execution.

MLOps gets the model to the boundary. Governance decides what crosses it.

The Forge runs the MLOps pipeline as a closed loop because the governance gate is part of it. The MLOps experiment record and the governance audit record are the same object at the data layer. That is the central claim of this page; everything below makes it concrete.

The pipeline in five phases
1
Preparation

Datasets, prompt templates, and evaluation corpora version in Harbor alongside container images — same GitOps promotion as application config. Experiment tracking happens at the governance layer: every training job is an auditable event with an intent, authority, execution, and outcome record. Not just an MLflow entry; a constitutional record.

2
Training & fine-tuning

Training runs are Kubernetes Jobs managed by ArgoCD. The governed training loop wraps the process in a state machine: parameter and compute budgets are structural gates, not aspirational limits; loss anomalies trigger a governed ANOMALY state before the backward pass; checkpoints are declared state transitions. State, not prompts, is the authority over training behaviour.

3
Registry & promotion

Trained artefacts — quantised weights, LoRA adapters, eval results — go to Harbor beside the images that serve them. A model is not deployed without three things: a passing evaluation artefact, a governance-reviewed promotion verdict, and a provenance record linking training to evaluation to promotion. The model registry is the application registry.

4
Inference serving

Inference runs on the M4 Pro and Mac Studio via Ollama and MLX, accelerated by the Neural Engine. Every call that produces an action enters the governance lifecycle — evaluated before execution, not after. Denial is a first-class outcome; escalation to human review is a declared policy outcome above the risk threshold. The audit trail is generated whether the action executes or not.

5
Observability, drift, evidence

Three observability signals, each answering a different question (see ISR below). Drift monitoring lives in the Reconnaissance channel because drift is degradation that has not yet become a defined metric deviation. The composition tracer adds the cross-session dimension. Every governed action writes a full provenance record — the evidence schema.

03 · Observability and the evidence record

Three signals in. One provenance record out.

The ISR mapping is structural, not metaphorical — it determines which tool you reach for under which kind of pressure. Every governed action then writes a single record that is both the audit artefact and the experiment record.

Intelligence
Traces

What happened in this session, end to end? Reconstruct it.

Surveillance
Metrics

Are the known indicators inside their tolerances? Page if not.

Reconnaissance
Logs

What's happening that hasn't been turned into a metric yet? Investigate.

evidence record · one per governed action
{
  session_id        — which session
  action_id         — which action inside it
  timestamp         — when
  prompt_hash       — what was asked, content-addressable
  proposed_action   — what the agent wanted to do
  evaluation        — ALLOW | DENY | ESCALATE
  policy_matched    — which rules were consulted
  composition_state — the session at evaluation time
  execution_result  — what happened (null on DENY)
  principal_review  — who approved (null unless ESCALATE)
  latency_ms        — how long the gate took
}

This record is the audit artefact and the experiment record. One object, both roles — and the architecture exists to keep it observable.

04 · Where Kyverno ends

Two layers, in series. Both necessary.

Kubernetes admission control — Kyverno, OPA, Gatekeeper — enforces structural rules at the API boundary. What it does not do is reason about intent, track composition across a session, or route anything to human review. That is not a criticism; it is a description of the layer it operates at. Kyverno governs the control plane. The ClawLaw gate governs the inference boundary.

Capability
Kyverno
ClawLaw
Structural rule enforcement at admission
Resource validation against declared constraints
Intent evaluation at action time
Composition tracking across a session
Risk-scored decisions on probabilistic inputs
Human escalation as a first-class outcome
Provenance chain from prompt to execution
The full position paper

Why policy management lives in the cluster and enforcement at the boundary — ADR-0006, the governance split →

The decisions that shaped this

The architecture emerged from a series of decisions made under real constraints. The full reasoning lives in the decisions index.

ADR-0001Why a Mac Mini clusterPlanned
ADR-0002ArgoCD over FluxPlanned
ADR-0003Kyverno before custom governancePlanned
ADR-0004Observability before automationPlanned
ADR-0005Why governance belongs in the control planePlanned
ADR-0006Governance split — policy management inside the cluster, enforcement at the boundaryPublished
ADR-0007MLOps on KubernetesPlanned
The decisions index →