Silicon · The Lab

The Lab.

Structured experiments and field reports from the governed AI lab. Each lab states a hypothesis, runs a controlled procedure, and publishes the data — whether the hypothesis holds or not. Reproducibility is the discipline.

4 complete 4 active 6 planned

Series F

The Forge

Operational labs on the Apple Silicon home-lab itself — hardware bring-up, inference benchmarks, model installs, and the cluster the other series depend on.

Labs

Complete

Active

Series A

Darwin

Recursive learning on local models. Can a governed agent improve by iterating on its own outputs without violating its constitutional boundaries? Behavioral consistency, self-critique loops, composition drift, recursive ceiling.

Labs

Complete

Active

Series B

OpenClaw

End-to-end governed automation on Apple Silicon. ClawLaw install, boundary enforcement, escalation flow, composition detection, audit trail integrity, multi-agent contention.

Labs

Complete

Active

Series C · Planned

Prometheus

Model drift detection and inference reliability as continuous MLOps discipline. Token throughput benchmarks, latency SLO definition, drift classification, alert routing from model metrics to on-call.

Planned

Complete

Active

All labs

L-001 Done

The Forge

Inference Benchmark: Apple Silicon vs Discrete GPU

What tokens-per-second actually measures, why most published comparisons are methodologically broken, and how to run a controlled experiment that produces data you can trust.

inferencebenchmarkmethodologyapple-silicon

L-002 Active

The Forge

Gemma 4 on Apple Silicon — Installation and Configuration

Bring up a single-node Apple Silicon home-lab. Ollama and MLX side by side, Open WebUI as the lab interface, Langfuse capturing every call. Six experiments from baseline tok/s to governed inference.

inferenceinstallollamamlx

L-003 Active

OpenClaw

OpenClaw Across the Provider Matrix

Install the OpenClaw agentic CLI, wire it to four backends — Anthropic, OpenAI, Gemini, and a local Ollama model — and run six experiments comparing tool-use behaviour across providers.

agentsopenclawproviderstool-use

L-022 Done

Darwin

Baseline identity

Establish the ceiling for behavioral consistency on a fixed-prompt local agent.

darwinbehavioral-consistency

Test whether self-evaluation produces measurable quality gains under governance.

darwinrecursivegovernance