This video provides a lot of background: https://www.youtube.com/watch?v=Eq3bUFgEcb4 and is also very funny, but it's not necessary to understand this post. If you've watched it you can probably skip to the "Why is Piano Roll Notation Bad?" section.
TL;DR
Music notation is very janky and weird, but attempts to reform it to be "logical" are usually even worse. The specific ways in which they are worse can tell us something about how information can be compressed, particularly when world-histories contain a set of sparse events.
We can extend this logic to sketch out how verbs work in the case of a bouncing spring. Most of the time it follows an uneventful freefall trajectory, punctuated by occasional "bounce" "events" so we can communicate its trajectory as a list of bounces, rather than as a function of position + extension over time.
Music and Frequencies
Let's use the piano as an example. A piano has 88 keys, with each key corresponding to a different note. A note corresponds to sound which has a particular frequency, for example middle C is 261.63 Hz. The ratio between the frequencies of two notes is called an interval and the simplest interval (1:2) is called an octave. The sentence "These two notes are one octave apart" means that one note has twice the frequency of the other.
Notes sit on a logarithmic scale. Notes with higher (faster) frequencies are said to have higher pitch, and we often talk about going "up" or "down" one e.g. octave, which means doubling or halving the pitch of a note, respectively.
In a piano, the set of notes that can be produced is discrete, because there are a discrete set of keys. Two adjacent keys will have a ratio of frequencies of which means that moving along by twelve keys takes you up exactly one octave.
The keys are arranged like this:
When a key is pressed, a little mallet hits two or three strings (made of steel of course) which vibrate at the frequency corresponding to that note. When the key is released, a fuzzy pad touches the strings and stops them vibrating. This allows the piano player to exactly control the start and stop times of the notes.
As well as a specific frequency, the word note also refers to the specific instance of a frequency being played. So middle C the frequency is a note, but a specific occasion of a piano player producing that frequency by playing the middle C key is also a note. For clarity I will sometimes use the phrases note-frequency and note-instance to communicate this where relevant.
Note-instances exist in a kind of 2D space, where one axis is frequency space (logarithmic) and the other is time. Each note-instance has a frequency, a start time, and an end time. If you're a person with a particular penchant for logical and rational information transmission, just hearing this framing may kick off a chain of thoughts which we'll circle back to later.
Existing Notation
If I write a piece of music and I want someone else be able to reproduce it, I have to send them some information telling them how to play it. Written sheet music is notated like this:
What does this all mean?
Duration is the easiest to explain: each symbol represents a note-instance of different duration. In the diagram below, each row has the same total duration; the diagram represents the halving of duration from one breve into two semibreves, four crotchets, then eight quavers.
How is the start time of a note-instance shown? Usually, it starts when the previous note-instance ends. If multiple notes need to be played at the same time then they get stuck together looking like a cluster of oats in a bowl of granola, which tells the player to play them all at once. This is usually called a "chord".
What if we want to have some peace and quiet? There are special "rest" symbols which tell you not to play anything. Like the note duration symbols they're arbitrary. I won't show them here but they look like little squiggles.
Next we'll talk about frequency. Frequency is partially transmitted by vertical position on those lines, which are called the staff. If you take a look at the piano above, you'll notice that some keys are black and some are white. The white keys (seven per octave) correspond to note-frequencies which are communicated by position on the staff:
The names of notes repeat every octave, so doubling a note-frequency (i.e. going up an octave) gives you a note with the same name.
What about the five black keys per octave? These are represented with extra symbols! The sharp symbol # before a note tells you to actually play the black key directly above that white key, and the flat symbol ♭ tells you to actually play the black key directly below that white key, like so:
The number of notes spanned by the lines one staff is just over an octave, but notes are often written above or below the staff on little extra line segments. This extends the total useful range of one staff to about two octaves. The specific set of notes that it refers to is determined by a symbol at the left hand side of the staff. More symbols shift the staff up or down by one or two octaves.
Learning to read music therefore means learning all these rules and symbols, plus some more symbols on top of those. I didn't even get into the ones for playing loudly or quietly!
Notation Reform to the Rescue!
So, can we improve upon it? Surely we can!
Firstly, why not have one vertical position for every single note-frequency, instead of just 7/12 of them? That would make things simpler, no more silly flat and sharp signs, and it's pleasingly symmetrical with respect to the notes' position in frequency space.
Next, why have special symbols for how long a note should last, when we could just have a timeline on the x-axis. That way the wider the note-instance is written, the longer it should be played for. Simple!
What you get is called a chromatic staff ("chromatic" in music is means "using all twelve note frequencies instead of just a subset") with relative duration. It's often referred to as piano-roll notation (though I sometimes think of it as terminal-physicist-brain notation) It's the most common type of notation reform. It ends up looking something like this:

Or like this:
And it's totally godawful for almost all practical purposes. Nobody actually plays music by reading this stuff.
Why is Piano-Roll Notation Bad?
The point of music notation is to transmit information. In order to do this efficiently it needs to compress information. Piano-roll does this really badly, because it fails to reflect the underlying rules that music normally obeys. I'm going to rephrase that as its contrapositive:
Good notation reflects the underlying structure of the thing it is trying to notate.
Piano-roll notation can express any pattern of keys being pressed. It's easy to express music where 57 of the 88 keys on a keyboard are pressed for 0.5 seconds; then a set of 63 keys for 0.64 seconds; then half of those are released and a new set of 23 keys are pressed for the next 2.21 seconds; before a set of seven keys are pressed repeatedly for durations of 0.23, 0.25, 0.26, 0.27 seconds.
But that's not what music sounds like!
Notes are not randomly distributed!
Humans only have two hands, and these hands are not infinitely wide, so the set of notes which can be played in a short period of time usually sits within two chunks of key space where its comfortable for the player's hands to go. So it actually makes sense to represent piano music with two staves, each of which covers a couple of octaves.
Plus, the number of notes being played at once is usually a lot less than the full 88. In fact it's also pretty much always less than ten, due to finger-number constraints.
It's also the case that most music does not include all twelve note-names equally. Most music uses a diatonic scale, which is a "menu" of seven out of the twelve. Handily, there are seven positions on the staff per octave.
Durations are not uniformly distributed either!
When it comes to duration, we should remember that music has rhythm. Notes come in fairly common durations: one beat, two beats, half a beat, which is what traditional notation succeeds at capturing. So the symbols are (usually) all we need. Most durations you will encounter are binary fractions of a bar, which the symbols make clear.
Even when we do see unusual note durations, piano roll still sucks! It's not uncommon, for example, to play five notes in the space of four, which traditional notation makes clear with a little "5", which communicates exactly what's going on in these somewhat rare occasions. Piano roll just puts those five notes in the same bit of space, and it's much harder at a glance to count exactly how many notes are dividing how much space.
There is also a horizontal space information-density problem with piano roll. If we're just playing one note for four beats, that doesn't carry much information; but if there are sixteen notes in the next four beats then there's much more information to be transmitted. Using relative duration, we have to expend the same amount of page-space for both of these, but with symbolic duration we can give more page-space to more information-dense periods of playing.
Sparsity
Some of these problems have similar shapes. Piano roll can express any combination of pressed-or-not-pressed values across the 88 keys, a total of different options! But we only ever really press 1-10 keys at a single time! The set of sets of keys we actually press is sparse in the set of possible sets of keys.
Likewise, piano roll can express any duration as a real number, but we actually mostly just want to represent a few simple fractions.
So this is the root of the problem. Piano roll is uncompressed music notation. It's slightly easier to figure out for an alien, but this isn't really relevant. Yes learning a bunch of arbitrary symbols and rules takes effort, but that effort is amortized: once the symbols are learned they can be used to transmit information much more quickly. When "sight-reading" i.e. playing music for the first time having not seen it before; the efficient transmission of information from page to keyboard is the most important factor.
Semantics
This post is actually about semantics. I did not come up with the idea for this post by thinking about music: I was thinking about a bouncing spring. I am now going to pull a full 180 and draw heavily on the work of John Wentworth and David Lorell, specifically what can be found here.
The Spring
Suppose we have a bunch of metal atoms which are shaped into a spring toy: it consists of a metal weight on top, a foot on the bottom, and a spring in the middle. When dropped, it falls, bounces, enters freefall again, and repeats. We'll assume it doesn't rotate, just bounces up and down. What are the natural-ish latent-ish things over this system?
As a reminder, the two conditions for natural latency are the following:
- Mediation: our observed variables are independent given the latent
- Redundancy: we can calculate our latent using only some of the observed variables
In this case the observed variables are the positions of the metal atoms at different timesteps. Our latents are a trajectory of position + extension plus a generalized geometry. Generalized geometry is a slight extension of rigid-body geometry to simple flexible bodies.
Sidebar: Generalized Geometry
A geometry is something which rigid objects (like teapots) have. Mathematically, it takes in the position (an element of ) and orientation (an element of ) of an object and tells us the positions of all of the object's atoms in 3D space. So a standard rigid-body geometry is a map .
A generalized geometry is just a map from something other than (position, orientation) to the coordinates of your atoms. In this case our spring-toy is just bouncing up and down so the position can be expressed as an element of . We also have an extra variable representing the extension of the spring, which is an element of . So in this case our generalized geometry is a map .
We can ignore away the concept of a generalized geometry at this point. It's not actually relevant to the rest of this piece; I was just using it to help illustrate that my choice of position + extension trajectory is roughly equivalent to a rigid body trajectory, which is where the theoretical work has already been done.
Position + Extension Trajectories
We only need to think about the position + extension trajectories from here on out.
Mediation: at each timestep, the positions of the atoms (sans some random independent motion) can be inferred just from knowing how high up the centre of mass of the object is, and how extended the spring is.
Redundancy: we only need to know the location of some of the atoms in the head, and some of the atoms in the foot, to figure out the location + extension of the spring.
Let's imagine we plot the position and pose of the spring toy over time. Both of these can be represented as scalar variables, so we can plot them as a simple x/y graph:
Seems like for most of the time extension is constant at its maximum value and position follows a quadratic curve with a consistent, negative coefficient. But very occasionally, extension will drop sharply before returning to maximum, and position will bend upwards along some other kind of curve.
Suppose we didn't know the exact equations governing the motion of the toy during that second type of period. How might we nonetheless efficiently compress most of the trajectory information to transmit it? Perhaps we would do something like this:
Start Time | End Time | Final Velocity |
3.1 s | 3.5 s | +4.8 m/s |
5.2 s | 5.5 s | +4.1 m/s |
8.7 s | 8.9 s | +3.3 m/s |
10.3 s | 10.5 s | +2.7 m/s |
This is a list of those periods where the extension drops and the position/time curve bends upwards. We don't need to know the initial velocity because we can calculate movement just fine during the periods of freefall.
This doesn't tell us the exact values of extension and position during the anomalous periods, but it does allow us to calculate the rest of the trajectory out into the future.
What does this look like? We've seen it before...
Sparsity Returns
What do we have in this system? We have long periods of one thing happening (freefall) punctuated by short periods of a different thing happening. The most effective way to transmit it is as a sparse list of information about the specific instances of not-freefall. Each row in the sparsely-encoded table is a specific unit, which we could call an event in the trajectory.
This (I think) illustrates a fact about events and verbs. Events are sparsely-coded bits of trajectories, and verbs sometimes refer to events.
But Why are Verbs Inter-Operable?
Imagine that instead of a spring toy we have a rubber bouncy ball. We can express natural latents over time as a generalized geometry, and a trajectory of position + compression, like with the spring. The ball also experiences moments of rapid change in (derivative of) position and pose, which punctuate longer periods of predictable freefall.
These events can be parameterized in the exact same way, with the exact same variable types, as the bounces above. If we copied our list of ball-bounces into the input to a program which expected spring-toy-bounces, it might behave a little oddly but we wouldn't get any value errors.
So we might use the same word "bounce" to describe them.
The next thing to consider is, what makes this the case? Perhaps it's the fact that the use of the verb "bounce" communicates information which is independent of whether the bouncing object is the ball or the spring-toy? In this case, the verb is inter-operable because it partially screens off the subject. If either object was bouncing on a ceiling above me at quarter-past-midnight, it would keep me awake just the same.
Perhaps it's the similarity of individual particle trajectories under each verb? In both cases we see an upwards acceleration of particle trajectories, with some particles coming closer to one another along the vertical axis.
This requires more thinking.
I agree with what you’re saying here, but I will say that traditional notation is a bit annoying for jazz …
… where, typically, each bar is only using 7 notes out of 12, but which 7 is changing almost every bar. You could, in principle, write this as a key signature per bar, but what people usually do is keep the same key signature throughout, use lots of sharps and flats, and write which chord it is over the bar
.. oh, and maybe you’re really playing it as swung 1/8 ths notes, but it would be too tedious to write the actual durations, so just write it like it’s straight 1/8th notes and put a notation that the whole thing is swing, actually.
I actually think that last one just sounds straightforwardly (hah) right? Note shapes express subdivisions of duration that correspond to common rhythmic structures of music, so if jazz music often uses an uneven subdivision at one level but follows the broad structure otherwise, then skewing the meaning of that level in the note shapes is bending the map toward the logical shape of the territory.