Linear Algebra
as Transformation.

Matrices are not tables of numbers. They are transformations of space. Eigenvectors are the directions that survive transformation. The dot product is the attention mechanism. This is where the transformer architecture lives.

01

Matrices as transformations of space

A 2×2 matrix doesn't store data. It defines a function that takes every point in the plane and moves it somewhere else. The identity matrix leaves everything in place. A rotation matrix spins the plane. A scaling matrix stretches or compresses. Every matrix is a verb, not a noun.

T(v) = Av — The matrix A transforms the vector v

02

Eigenvectors: the invariant directions

Most vectors change direction when you apply a matrix transformation. But some vectors only get stretched or compressed — they keep pointing the same way. These are eigenvectors. The amount they get scaled is the eigenvalue.

Av = λv — The eigenvector v scaled by eigenvalue λ

In machine learning, eigenvectors reveal the principal directions of variation in data. Principal Component Analysis (PCA) is eigendecomposition applied to covariance matrices. The most important features are the eigenvectors with the largest eigenvalues.

03

Interactive transformer lab

The canvas below lets you define a 2×2 transformation matrix and watch it act on the plane. Preset matrices demonstrate rotation, reflection, shearing, and projection. Eigenvectors are shown in real time.

Interactive canvas coming soon. This lab is under active development.

Transformer Lab

2×2 matrix transformation visualizer

Identity Rotation 45° Reflection Shear Projection

04

The dot product as attention

The transformer architecture that powers GPT, Claude, and Apple's Foundation Models computes attention using scaled dot products. Two vectors' dot product measures alignment — how much one vector points in the direction of another. High dot product means high attention.

Attention(Q, K, V) = softmax(QK^T / √d_k) · V

Q is the query. K is the key. V is the value. The dot product QK^T produces the attention scores. Softmax normalizes them into a probability distribution. This is linear algebra at the heart of every large language model.

05

Attention heatmap

Below is where the attention heatmap visualization will live — showing how tokens in a sequence attend to each other. Each cell represents the dot-product similarity between a query token and a key token.

Interactive visualization coming soon.

The same linear algebra that powers the attention mechanism also governs the state space of SwiftVector. Constraint evaluation is a linear map. The codex composes constraints the way matrices compose transformations. Understanding linear algebra is understanding the machinery underneath.

— From the curriculum thesis

Continue the sequence

Previous Module

← The Calculus of Learning

Gradient descent as terrain. The loss landscape. Where mathematics meets machine learning.

Next Module

Statistics & Probability →

Distributions, Bayesian inference, the mathematical language of uncertainty.

Linear Algebraas Transformation.