I read this and, it said:
there are huge low hanging fruit that any AI or random person designing AI in their garage can find by just grasping in the dark a bit, to get huge improvements at accelerating speeds.
have we found anything like this? at all? have we seen any "weird tricks" discovered that make AI way more powerful for no reason?
"We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence." -SwiGLU paper.
I think it varies, a few of these are trying "random" things, but mostly they are educated guesses which are then validated empirically. Often there is a spefic problem we want to solve i.e. exploding gradients or O(n^2) attention and then authors try things which may or may not solve/mitigate the problem.