Silicon · The Lab

The Lab.

Structured experiments and field reports from the governed AI lab. Each lab states a hypothesis, runs a controlled procedure, and publishes the data — whether the hypothesis holds or not. Reproducibility is the discipline.

4 complete 4 active 6 planned
Series
Status
F
Series F

The Forge

Operational labs on the Apple Silicon home-lab itself — hardware bring-up, inference benchmarks, model installs, and the cluster substrate the other series depend on.

2
Labs
1
Complete
1
Active
A
Series A

Darwin

Recursive learning on local models. Can a governed agent improve by iterating on its own outputs without violating its constitutional boundaries? Behavioral consistency, self-critique loops, composition drift, recursive ceiling.

4
Labs
1
Complete
1
Active
B
Series B

OpenClaw

End-to-end governed automation on Apple Silicon. ClawLaw install, boundary enforcement, escalation flow, composition detection, audit trail integrity, multi-agent contention.

8
Labs
2
Complete
2
Active
C
Series C · Planned

Prometheus

Model drift detection and inference reliability as continuous MLOps discipline. Token throughput benchmarks, latency SLO definition, drift classification, alert routing from model metrics to on-call.

6
Planned
0
Complete
0
Active
All labs
L-001 Done
The Forge
Inference Benchmark: Apple Silicon vs Discrete GPU

What tokens-per-second actually measures, why most published comparisons are methodologically broken, and how to run a controlled experiment that produces data you can trust.

inferencebenchmarkmethodologyapple-silicon
L-002 Active
The Forge
Gemma 4 on Apple Silicon — Installation and Configuration

Bring up a single-node Apple Silicon home-lab. Ollama and MLX side by side, Open WebUI as the lab interface, Langfuse capturing every call. Six experiments from baseline tok/s to governed inference.

inferenceinstallollamamlx
L-003 Active
OpenClaw
OpenClaw Across the Provider Matrix

Install the OpenClaw agentic CLI, wire it to four backends — Anthropic, OpenAI, Gemini, and a local Ollama model — and run six experiments comparing tool-use behaviour across providers.

agentsopenclawproviderstool-use
L-022 Done
Darwin
Baseline identity

Establish the ceiling for behavioral consistency on a fixed-prompt local agent.

darwinbehavioral-consistency
Not yet documented
L-023 Active
Darwin
Self-critique loop

Test whether self-evaluation produces measurable quality gains under governance.

darwinrecursivegovernance
Not yet documented
L-024 Planned
Darwin
Composition drift

Detect quality drift inside a session before it reaches the output boundary.

darwincompositiondrift
Not yet documented
L-025 Planned
Darwin
Recursive ceiling

Find the plateau point where iterative self-improvement stops yielding gains.

darwinrecursivecapacity
Not yet documented
L-026 Done
OpenClaw
ClawLaw install

Constitutional governance bootstrap on a clean Apple Silicon node.

openclawinstallgovernance
Not yet documented
L-027 Done
OpenClaw
Boundary enforcement

Verify the filesystem boundary holds against sequential probe patterns.

openclawgovernanceboundary
Not yet documented
L-028 Active
OpenClaw
Escalation flow

Confirm ESCALATE verdicts pause execution until principal review completes.

openclawgovernanceescalation
Not yet documented
L-029 Planned
OpenClaw
Composition detection

Catch boundary-probe sequences using session-aware composition rules.

openclawgovernancecomposition
Not yet documented
L-030 Planned
OpenClaw
Audit trail integrity

Replay the audit log against the same initial state and prove determinism.

openclawauditdeterminism
Not yet documented
L-031 Planned
OpenClaw
Multi-agent contention

Two governed agents, one governance layer — verify the state store under contention.

openclawconcurrency
Not yet documented
L-032 Planned
OpenClaw
Production benchmark

A full 8-hour governed development session, end to end, with zero governance failures.

openclawgovernanceproduction
Not yet documented
No labs match the current filters.
Beyond the lab