Any AI governance system worth taking seriously has to answer two structural questions before it answers anything else. The instinct is to answer both the same way: governance lives outside, on dedicated hardware, changeable only by a separate authority. That instinct is right about where the boundary belongs and wrong about where the policy belongs.
Inside the platform that runs the agents, or outside it on independent infrastructure?
The same operators who run the agent workloads, or someone structurally separate from them?
One requirement wants inference inside the cluster. The other wants the governor structurally out of the governed's reach. The cheap resolutions sacrifice one to satisfy the other.
If inference never enters Kubernetes, the model lifecycle never closes. Training, registry, and deployment are all platform-managed — and then the actual inference step runs on hardware outside the platform. The pipeline is half-managed, half-ad-hoc. The MLOps claim can't be made honestly.
The first requirement of any governance system is that the thing being governed can't change the thing governing it. Physical separation on dedicated hardware is the strongest form of that guarantee. Anything weaker is a promise rather than a structure.
"Governance" turns out to be two functions — policy management (the rules, thresholds, and version history) and policy enforcement (the gate that turns a proposed action into a verdict). The winning option is the one that stops treating them as a single thing.
Governance stays on dedicated hardware. Inference inside the cluster routes out to it for every action.
The MLOps pipeline never closes. A network round-trip sits in the hot path, and the cluster is permanently calling an external service to evaluate its own workloads.
Governance becomes a cluster service. GitOps manages it, ArgoCD deploys it, Prometheus observes it.
Structural separation collapses. Anyone with cluster-admin rights can redeploy the governance layer — so a cluster compromise is a governance compromise.
Recognise that "governance" is two jobs — managing policy and enforcing it — and give each a different home.
Policy management runs inside the cluster, with every operational benefit. Enforcement runs at a boundary the cluster cannot reach. Both stories hold.
Split the governance architecture into two layers with different infrastructure homes. The cluster pushes policy. The boundary applies it. The cluster cannot reach the boundary as a workload.
Rules, risk thresholds, escalation criteria, version history. GitOps-managed, ArgoCD-deployed, Harbor-versioned, observable through the telemetry stack. The Git commit log is the audit trail.
The gate that intercepts each proposed action and issues a verdict. Runs on a separate machine, a different OS, behind a different network boundary. An agent inside K3s cannot modify it.
The argument against putting governance inside the cluster turns on one thing: the governed cannot be able to modify the governor. That is a claim about enforcement, not about where the rules are written down.
The Kyverno precedent: Kyverno's policies live as ordinary resources inside the very cluster they govern. Nobody calls that a separation failure, because the enforcement webhook intercepts requests at a structural boundary before they reach the API server. The rules live inside; the gate sits where the workloads can't reach it. This decision applies the same pattern — with the boundary at an unusually concrete location: a separate physical machine.
Putting policy management on the platform makes governance harder to change ad hoc, not easier. Every policy change becomes a Git commit — reviewed, dry-run against live traffic, promoted through the same pipeline as application config. The audit trail is the commit log. The alerting fires on policy drift. Governance that is unobservable is governance that cannot be improved, and external dedicated hardware is exactly where governance goes unobserved.
Beyond a single deployment, governance cannot run on dedicated external hardware indefinitely — the cost and operational overhead compound until something gives. The pattern that scales is the one this decision implements: policy as a platform service, enforcement at a structural boundary, with a verified channel between them. It is the architecture inside hardware security modules, Kubernetes admission webhooks, and network access control. Most production deployments will eventually move the boundary from a physical machine to a logical equivalent — an attested execution environment, a separately-administered account. The split is the architecture; the boundary's implementation is a deployment detail.
With the split accepted, inference becomes a fully governed, fully schedulable Kubernetes workload. Training, evaluation, registry promotion, inference, observability, governance, and evidence capture all live in one operational environment. The full MLOps architecture →
A new service on worker-02. It owns policy management, versioning, and the routing that connects the inside of the cluster to the enforcement layer outside it. Without it, the split has no inside half.
A defined protocol for pushing policy from cluster to boundary — authenticated and cryptographically signed. The boundary refuses policy it cannot verify came from an authorised source. Anything weaker turns the split back into a trust relationship.
If policy management is unavailable — node down, network partition, upgrade in progress — the boundary needs a safe default. Three candidates: deny all, apply last-known-good, or escalate everything to a human. Fail-closed is the committed default.
The enforcement kernel on the M4 Pro is unchanged. It still evaluates proposed actions and issues verdicts — it now receives policy from the cluster proxy rather than managing it locally. The evaluation logic, the verdict format (ALLOW · DENY · ESCALATE), and the evidence schema are all the same. The audit record still captures the verdict, the policy version that produced it, and the full action context.