The model lifecycle is a pipeline. The audit record is the experiment record. Once the governance layer is part of the platform, those two sentences are the same sentence.
DevOps gave us a discipline for shipping software reliably. MLOps applies the same discipline to a different artefact class — models instead of binaries, training jobs instead of builds, drift monitoring instead of error rates. What neither addresses is the moment a model produces an output that proposes an action: "I think we should rm -rf this directory," or "I think we should send this email." Something has to decide whether that proposal becomes execution.
MLOps gets the model to the boundary. Governance decides what crosses it.
The Forge runs the MLOps pipeline as a closed loop because the governance gate is part of it. The MLOps experiment record and the governance audit record are the same object at the data layer. That is the central claim of this page; everything below makes it concrete.
Datasets, prompt templates, and evaluation corpora version in Harbor alongside container images — same GitOps promotion as application config. Experiment tracking happens at the governance layer: every training job is an auditable event with an intent, authority, execution, and outcome record. Not just an MLflow entry; a constitutional record.
Training runs are Kubernetes Jobs managed by ArgoCD. The governed training loop wraps the process in a state machine: parameter and compute budgets are structural gates, not aspirational limits; loss anomalies trigger a governed ANOMALY state before the backward pass; checkpoints are declared state transitions. State, not prompts, is the authority over training behaviour.
Trained artefacts — quantised weights, LoRA adapters, eval results — go to Harbor beside the images that serve them. A model is not deployed without three things: a passing evaluation artefact, a governance-reviewed promotion verdict, and a provenance record linking training to evaluation to promotion. The model registry is the application registry.
Inference runs on the M4 Pro and Mac Studio via Ollama and MLX, accelerated by the Neural Engine. Every call that produces an action enters the governance lifecycle — evaluated before execution, not after. Denial is a first-class outcome; escalation to human review is a declared policy outcome above the risk threshold. The audit trail is generated whether the action executes or not.
Three observability signals, each answering a different question (see ISR below). Drift monitoring lives in the Reconnaissance channel because drift is degradation that has not yet become a defined metric deviation. The composition tracer adds the cross-session dimension. Every governed action writes a full provenance record — the evidence schema.
The ISR mapping is structural, not metaphorical — it determines which tool you reach for under which kind of pressure. Every governed action then writes a single record that is both the audit artefact and the experiment record.
What happened in this session, end to end? Reconstruct it.
Are the known indicators inside their tolerances? Page if not.
What's happening that hasn't been turned into a metric yet? Investigate.
{
session_id — which session
action_id — which action inside it
timestamp — when
prompt_hash — what was asked, content-addressable
proposed_action — what the agent wanted to do
evaluation — ALLOW | DENY | ESCALATE
policy_matched — which rules were consulted
composition_state — the session at evaluation time
execution_result — what happened (null on DENY)
principal_review — who approved (null unless ESCALATE)
latency_ms — how long the gate took
} This record is the audit artefact and the experiment record. One object, both roles — and the architecture exists to keep it observable.
Kubernetes admission control — Kyverno, OPA, Gatekeeper — enforces structural rules at the API boundary. What it does not do is reason about intent, track composition across a session, or route anything to human review. That is not a criticism; it is a description of the layer it operates at. Kyverno governs the control plane. The ClawLaw gate governs the inference boundary.
Why policy management lives in the cluster and enforcement at the boundary — ADR-0006, the governance split →
The architecture emerged from a series of decisions made under real constraints. The full reasoning lives in the decisions index.