Governed AI on constrained Apple hardware — iPhone, iPad, Apple Watch, and beyond. Different substrate. Different memory envelope. Identical constitutional architecture.
The mainstream framing gets causality backwards. The edge is not where intelligence runs because it can — it is where intelligence runs because the device is the principal environment. The user’s hardware is the authority boundary. Everything else is a governed extension.
Apple understood this before the industry did. On-device inference is not a cost optimisation or a latency hack. It is a governance architecture — the device holds the keys, controls the model lifecycle, and decides when (and whether) computation leaves the trust boundary.
‘Where inference runs’ is now a product decision, an architectural decision, and a governance question — all three at once.
Apple’s ML stack is not one framework. It is three deployment tiers — each with different trade-offs, different audiences, and different governance hooks. Understanding the tiers is prerequisite to governing them.
Apple’s production deployment framework. Convert, optimise, and ship ML models as .mlmodel packages. Optimised for Apple Silicon across every device class.
Apple’s on-device LLM exposed through a Swift API. Tool calling, guided generation, and structured output — the native intelligence layer announced at WWDC25.
Apple’s ML research framework for Apple Silicon. Fine-tuning, LoRA adapters, research-grade inference. Open-source and rapidly evolving through the MLX ecosystem.
The governance question is substrate-independent. Whether the model runs through Core ML, Foundation Models, or MLX, the constitutional architecture must apply. The deployment tier changes the performance envelope — not the authority model.
Apple’s edge AI story only works because of vertical integration. The Neural Engine, GPU compute, and unified memory architecture are not marketing features — they are the substrate that makes governed on-device inference practical at scale.
Inference runs on shared GPU clusters. Latency depends on queue depth, network conditions, and provider load. Compute scales elastically but the user has no control over the execution environment.
Inference runs on the user’s silicon. Latency is bounded by the device’s compute class. No network dependency for core inference. The execution environment is under the principal’s physical control.
Private Cloud Compute is not the opposite of edge intelligence — it is part of it. PCC extends the device’s trust boundary into Apple’s infrastructure under cryptographic attestation. The device decides what leaves. The cloud proves what it ran.
The device remains the principal environment. The cloud is a governed extension, used selectively and audibly under policy. PCC does not change the authority model — it extends the execution surface while preserving it.
Edge intelligence is not theoretical. The governed architecture runs in the field today — on iPads in utility corridors, on iPhones at inspection sites, on Apple Watch for situational awareness. Same constitutional architecture. Different memory envelope.
The governed thermal inspection workflow running offline on iPad. Core ML inference for thermal anomaly detection, governed by the same constitutional architecture that runs on the desktop. The reference jurisdiction for edge deployment — proving that governance does not require connectivity.
The governance architecture is identical whether the agent runs on a Mac Pro or an iPad in a utility corridor. No governance gap between desktop and field. The constitutional architecture is substrate-independent by design.