Intelligence · Edge

Edge Intelligence.

Governed AI on constrained Apple hardware — iPhone, iPad, Apple Watch, and beyond. Different substrate. Different memory envelope. Identical constitutional architecture.

01 · The Architecture Thesis

Not ‘AI on devices.’ AI governed by devices.

The mainstream framing gets causality backwards. The edge is not where intelligence runs because it can — it is where intelligence runs because the device is the principal environment. The user’s hardware is the authority boundary. Everything else is a governed extension.

Apple understood this before the industry did. On-device inference is not a cost optimisation or a latency hack. It is a governance architecture — the device holds the keys, controls the model lifecycle, and decides when (and whether) computation leaves the trust boundary.

‘Where inference runs’ is now a product decision, an architectural decision, and a governance question — all three at once.
Authority axis
Device
Principal
PCC
Governed ext.
Cloud
Subordinate
Cloud-First Copilot
The default pattern
Inference: Cloud
Control: Provider
Governance: Terms of service
Apple Intelligence
The platform pattern
Inference: Device + PCC
Control: User + Apple
Governance: Platform policy
Governed Edge
The constitutional pattern
Inference: Device-first, PCC governed
Control: Principal (user/org)
Governance: Constitutional architecture
02 · Apple’s AI Stack

Three deployment paths. One governance requirement.

Apple’s ML stack is not one framework. It is three deployment tiers — each with different trade-offs, different audiences, and different governance hooks. Understanding the tiers is prerequisite to governing them.

Core ML
Ship

Apple’s production deployment framework. Convert, optimise, and ship ML models as .mlmodel packages. Optimised for Apple Silicon across every device class.

Format
.mlmodel / .mlpackage
Model size
Any (quantised)
Use case
Production apps
Governance hook
Model manifest
Foundation Models
Access

Apple’s on-device LLM exposed through a Swift API. Tool calling, guided generation, and structured output — the native intelligence layer announced at WWDC25.

Format
Swift API
Model size
~3B params
Use case
App intelligence
Governance hook
Tool schema + guardrails
MLX
Lab

Apple’s ML research framework for Apple Silicon. Fine-tuning, LoRA adapters, research-grade inference. Open-source and rapidly evolving through the MLX ecosystem.

Format
Safetensors / GGUF
Model size
1B–70B+
Use case
Research + fine-tuning
Governance hook
Adapter provenance

The governance question is substrate-independent. Whether the model runs through Core ML, Foundation Models, or MLX, the constitutional architecture must apply. The deployment tier changes the performance envelope — not the authority model.

03 · Apple Silicon as AI Platform

The hardware thesis.

Apple’s edge AI story only works because of vertical integration. The Neural Engine, GPU compute, and unified memory architecture are not marketing features — they are the substrate that makes governed on-device inference practical at scale.

Neural Engine
Dedicated inference

Hardware accelerator purpose-built for matrix operations. Runs Core ML models at peak efficiency with minimal power draw. The primary inference path for production workloads.

GPU Compute
Parallel workloads

Metal Performance Shaders and MPS Graph. Handles large-batch inference, MLX execution, and workloads that exceed Neural Engine capacity. The flexible compute tier.

CPU Fallback
Sequential ops

Pre/post-processing, tokenisation, custom operators. Handles operations that don’t parallelise well. Always available, lowest throughput for matrix work.

Unified Memory
Zero-copy substrate

Shared memory between CPU, GPU, and Neural Engine. No transfer overhead between compute units. The architectural advantage that makes edge inference competitive.

Cloud Inference
Elastic compute, variable latency

Inference runs on shared GPU clusters. Latency depends on queue depth, network conditions, and provider load. Compute scales elastically but the user has no control over the execution environment.

Latency
200ms–2s+
Connectivity
Required
Edge Inference
Fixed compute, deterministic latency

Inference runs on the user’s silicon. Latency is bounded by the device’s compute class. No network dependency for core inference. The execution environment is under the principal’s physical control.

Latency
10–100ms
Connectivity
Optional
04 · Private Cloud Compute

The governed cloud extension.

Private Cloud Compute is not the opposite of edge intelligence — it is part of it. PCC extends the device’s trust boundary into Apple’s infrastructure under cryptographic attestation. The device decides what leaves. The cloud proves what it ran.

Trust boundary topology
Device
Principal environment
On-device inference. Full data control. The trust anchor for the entire system.
PCC
Governed extension
Attested Apple Silicon in the cloud. Stateless execution. Cryptographic proof of code identity.
Cloud
Outside boundary
Third-party APIs and services. Not governed. Data leaves the trust boundary entirely.

The device remains the principal environment. The cloud is a governed extension, used selectively and audibly under policy. PCC does not change the authority model — it extends the execution surface while preserving it.

05 · Field Deployment

Where the constitutional architecture meets the field.

Edge intelligence is not theoretical. The governed architecture runs in the field today — on iPads in utility corridors, on iPhones at inspection sites, on Apple Watch for situational awareness. Same constitutional architecture. Different memory envelope.

Reference jurisdiction
ThermalLaw
Governed thermal inspection · iPad + iPhone · Offline-capable

The governed thermal inspection workflow running offline on iPad. Core ML inference for thermal anomaly detection, governed by the same constitutional architecture that runs on the desktop. The reference jurisdiction for edge deployment — proving that governance does not require connectivity.

Inference
Core ML on Neural Engine
Governance
AgentVector · deterministic
Connectivity
Full offline operation
Status
In development · 2026
Developing
Core ML in production

Converting, optimising, and deploying models through Core ML. The production deployment path for governed edge inference.

Developing
MLX for practitioners

Fine-tuning, LoRA adapters, and research-grade inference with MLX on Apple Silicon. The practitioner’s laboratory path.

Planned
Governed field deployment

Deploying governed agents to field hardware — iPads, iPhones, Apple Watch. Offline-first operational patterns.

Planned
Edge governance patterns

Patterns for constitutional governance under constrained memory, intermittent connectivity, and thermal limits.

The governance architecture is identical whether the agent runs on a Mac Pro or an iPad in a utility corridor. No governance gap between desktop and field. The constitutional architecture is substrate-independent by design.