I'll be brief, omit needless words.
Intelligence is prediction is compression because
Compression is finding a code that makes the data shorter
And codeword lengths are probabilities
So codes are probability distributions
But probability distributions are prediction strategies.
But then he'd lose the Strunk and White allusion.