Edge Intelligence — Governed AI on constrained Apple hardware

01 · The Architecture Thesis

Not ‘AI on devices.’ AI governed by devices.

The mainstream framing gets causality backwards. The edge is not where intelligence runs because it can — it is where intelligence runs because the device is the principal environment. The user’s hardware is the authority boundary. Everything else is a governed extension.

Apple understood this before the industry did. On-device inference is not a cost optimisation or a latency hack. It is a governance architecture — the device holds the keys, controls the model lifecycle, and decides when (and whether) computation leaves the trust boundary.

‘Where inference runs’ is now a product decision, an architectural decision, and a governance question — all three at once.

Authority axis

Device

Principal

PCC

Governed ext.

Cloud

Subordinate

Cloud-First Copilot

The default pattern

Inference: Cloud

Control: Provider

Governance: Terms of service

Apple Intelligence

The platform pattern

Inference: Device + PCC

Control: User + Apple

Governance: Platform policy

Governed Edge

The constitutional pattern

Inference: Device-first, PCC governed

Control: Principal (user/org)

Governance: Constitutional architecture

02 · The Desktop Is an Edge Device

When AI stops answering and starts operating.

There is no separate ‘desktop intelligence’ sitting apart from this. A Mac is an edge device — the keys held locally, the model able to run on the same silicon, the trust boundary drawn at the machine in front of the person. What makes the desktop distinctive is not a different authority model; it is proximity to state. The model gains the same surface a human developer uses: the file system, the shell, the application layer, the screen.

That proximity is exactly why the edge framing matters most here. The model can read your project, see your errors, invoke your tools, and apply changes where you work — not in a separate window you copy-paste from. The question is no longer whether AI will act on the desktop. It is who governs it when it does — and the answer is the device.

The desktop is the new action plane for AI — and an action plane is an edge, governed by the hand that holds it.

Chat surface AI

AI that answers

• Receives a prompt, returns text
• No access to local files
• No tool invocation
• No persistent session state
• Governance: terms of service only

Edge · the desktop case

AI that operates

• Reads project context, proposes diffs
• Full file-system access
• Shell, git, build tools, APIs
• Session state across interactions
• Governance: the device holds the keys

03 · Apple’s AI Stack

Three deployment paths. One governance requirement.

Apple’s ML stack is not one framework. It is three deployment tiers — each with different trade-offs, different audiences, and different governance hooks. Understanding the tiers is prerequisite to governing them.

Core ML

Ship

Apple’s production deployment framework. Convert, optimise, and ship ML models as .mlmodel packages. Optimised for Apple Silicon across every device class.

Format

.mlmodel / .mlpackage

Model size

Any (quantised)

Use case

Production apps

Governance hook

Model manifest

Foundation Models

Access

Apple’s on-device LLM exposed through a Swift API. Tool calling, guided generation, and structured output — the native intelligence layer announced at WWDC25.

Format

Swift API

Model size

~3B params

Use case

App intelligence

Governance hook

Tool schema + guardrails

MLX

Lab

Apple’s ML research framework for Apple Silicon. Fine-tuning, LoRA adapters, research-grade inference. Open-source and rapidly evolving through the MLX ecosystem.

Format

Safetensors / GGUF

Model size

1B–70B+

Use case

Research + fine-tuning

Governance hook

Adapter provenance

The governance question is substrate-independent. Whether the model runs through Core ML, Foundation Models, or MLX, the constitutional architecture must apply. The deployment tier changes the performance envelope — not the authority model.

04 · Apple Silicon as AI Platform

The hardware thesis.

Apple’s edge AI story only works because of vertical integration. The Neural Engine, GPU compute, and unified memory architecture are not marketing features — they are the substrate that makes governed on-device inference practical at scale.

Neural Engine

Dedicated inference

Hardware accelerator purpose-built for matrix operations. Runs Core ML models at peak efficiency with minimal power draw. The primary inference path for production workloads.

GPU Compute

Parallel workloads

Metal Performance Shaders and MPS Graph. Handles large-batch inference, MLX execution, and workloads that exceed Neural Engine capacity. The flexible compute tier.

CPU Fallback

Sequential ops

Pre/post-processing, tokenisation, custom operators. Handles operations that don’t parallelise well. Always available, lowest throughput for matrix work.

Unified Memory

Zero-copy substrate

Shared memory between CPU, GPU, and Neural Engine. No transfer overhead between compute units. The architectural advantage that makes edge inference competitive.

Cloud Inference

Elastic compute, variable latency

Inference runs on shared GPU clusters. Latency depends on queue depth, network conditions, and provider load. Compute scales elastically but the user has no control over the execution environment.

Latency

200ms–2s+

Connectivity

Required

Edge Inference

Fixed compute, deterministic latency

Inference runs on the user’s silicon. Latency is bounded by the device’s compute class. No network dependency for core inference. The execution environment is under the principal’s physical control.

Latency

10–100ms

Connectivity

Optional

05 · Private Cloud Compute

The governed cloud extension.

Private Cloud Compute is not the opposite of edge intelligence — it is part of it. PCC extends the device’s trust boundary into Apple’s infrastructure under cryptographic attestation. The device decides what leaves. The cloud proves what it ran.

Trust boundary topology

Device

Principal environment

On-device inference. Full data control. The trust anchor for the entire system.

PCC

Governed extension

Attested Apple Silicon in the cloud. Stateless execution. Cryptographic proof of code identity.

Cloud

Outside boundary

Third-party APIs and services. Not governed. Data leaves the trust boundary entirely.

The device remains the principal environment. The cloud is a governed extension, used selectively and audibly under policy. PCC does not change the authority model — it extends the execution surface while preserving it.

06 · Field Deployment

Where the constitutional architecture meets the field.

Edge intelligence is not theoretical. The governed architecture runs in the field today — on iPads in utility corridors, on iPhones at inspection sites, on Apple Watch for situational awareness. Same constitutional architecture. Different memory envelope.

Reference jurisdiction

ThermalLaw

Governed thermal inspection · iPad + iPhone · Offline-capable

The governed thermal inspection workflow running offline on iPad. Core ML inference for thermal anomaly detection, governed by the same constitutional architecture that runs on the desktop. The reference jurisdiction for edge deployment — proving that governance does not require connectivity.

Inference

Core ML on Neural Engine

Governance

AgentVector · deterministic

Connectivity

Full offline operation

Status

In development · 2026

Developing

Core ML in production

Converting, optimising, and deploying models through Core ML. The production deployment path for governed edge inference.

Developing

MLX for practitioners

Fine-tuning, LoRA adapters, and research-grade inference with MLX on Apple Silicon. The practitioner’s laboratory path.

Planned

Governed field deployment

Deploying governed agents to field hardware — iPads, iPhones, Apple Watch. Offline-first operational patterns.

Planned

Edge governance patterns

Patterns for constitutional governance under constrained memory, intermittent connectivity, and thermal limits.

The governance architecture is identical whether the agent runs on a Mac Pro or an iPad in a utility corridor. No governance gap between desktop and field. The constitutional architecture is substrate-independent by design.

Edge Intelligence.

Not ‘AI on devices.’ AI governed by devices.

When AI stops answering and starts operating.

Three deployment paths. One governance requirement.

The hardware thesis.

The governed cloud extension.

Where the constitutional architecture meets the field.