The Quadrivium placed Music between Arithmetic and Astronomy deliberately. Music is what happens when number moves through time. Ratio, proportion, and the detection of recurring structure in a continuous stream — the cognitive substrate of every sequence model, every attention mechanism, and every practitioner who can tell signal from noise.
“Music is number in time — the discipline of ratio and proportion extended into duration.”
— the medieval Quadrivium definitionThe medieval definition was precise: Arithmetic is number considered at rest. Geometry is number considered in space. Music is number considered in time — specifically, the ratios that exist between durations, between frequencies, between recurring events in a stream. It is not about pleasure. It is about the formal structure of temporal patterns.
This is exactly what a sequence model processes. A transformer attending over a sequence of tokens is performing Music in the medieval sense — detecting ratio and proportion relationships between elements distributed across time.
In 1822, Fourier proved that any periodic function can be decomposed into a sum of simple sinusoids. A complex sound is a superposition of fundamental frequencies and overtones. The Fourier transform is Music theory made into a universal mathematical tool. Spectrograms are music notation for arbitrary signals.
Speech recognition, audio generation, EEG analysis, image compression via DCT — all Fourier applications. The engineer who understands why a sine wave is the fundamental unit of signal representation understands why convolution and filtering work at all.
Music trains a specific cognitive capacity: the detection of pattern recurrence in a stream, at varying intervals, in varying contexts. A theme recurs in different keys, at different tempos, in different registers — and the trained listener recognises it despite all surface variation. This requires extracting invariant structure from variant surface.
This is precisely what a language model must do. The same semantic motif recurs in different surface forms, and a model that understands language must recognise the invariant structure beneath the variant expression.
The Pythagorean discovery: consonant intervals correspond to simple integer ratios. The octave is 2:1. The perfect fifth is 3:2. Dissonant intervals have complex ratios. Consonance is arithmetic simplicity made audible. The ear is a ratio detector.
In an embedding space, semantic similarity is embedding proximity. Tokens that frequently appear in similar contexts cluster together. Harmonically, consonant notes are nearby in the ratio lattice. Semantically, related words are nearby in the embedding space. The formal structure is identical.
Music of Pattern explores the formal relationship between musical structure and sequence processing. Interactive Fourier decomposition, harmonic series visualisation, and attention-as-counterpoint demonstrations are being designed.