You said "there are too few strictly-orthogonal directions, so we need to cram things in somehow."
I don't think that's true. That is a low-dimensional intuition that does not translate to high dimensions. It may be "strictly" true if you want the vectors to be exactly orthogonal, but such perfect orthogonality is unnecessary. See e.g. papers that discuss "the linearity hypothesis' in deep learning.
As a previous poster pointed out (and as Richard Hamming pointed out long ago) "almost any pair of random vectors in high-dimensional space are almost-orthogonal... (read more)
You said "there are too few strictly-orthogonal directions, so we need to cram things in somehow."
I don't think that's true. That is a low-dimensional intuition that does not translate to high dimensions. It may be "strictly" true if you want the vectors to be exactly orthogonal, but such perfect orthogonality is unnecessary. See e.g. papers that discuss "the linearity hypothesis' in deep learning.
As a previous poster pointed out (and as Richard Hamming pointed out long ago) "almost any pair of random vectors in high-dimensional space are almost-orthogonal... (read more)