Posts

Sorted by New

Wiki Contributions

Comments

Actually I would still really appreciate the training hyperparameters like batch size, learning rate schedule...

Ah, never mind, I believe I found the relevant hyperparameters here: https://github.com/adamimos/epsilon-transformers/blob/main/examples/msp_analysis.ipynb

In particular, the stuff I needed was that it has only a single attention head per layer, and 4 layers.

No, the actual hidden Markov process used to generate the awesome triangle fractal image is not the {0,1,random} model but a different one, which is called "Mess3" and has a symmetry between the 3 hidden states.

Also, they're not claiming the transformer learns merely the hidden states of the HMM, but a more complicated thing called the "mixed state presentation", which is not the states that the HMM can be in but the (usually much larger number of) belief states which an ideal prediction process trying to "sync" to it might go thru.

Can you share the hyperparameters used to make this figure?

Okay my computer right here has 10^13 bits of storage and without too much trouble I could get it to use all that memory as a counter and just count to the highest value possible, which would be 2^(10^13) or in other words much much longer than the age of the universe even at a fast clock speed.

Now technically yes, after it got to that 2^(10^13) value it would have to either halt or start over from 0 or something... but that seems not so practically relevant to me because it's such a huge integer value.

I haven't dived into this yet, but am I right in guessing that the gist is exactly like a way more fleshed-out and intricate version of Hofstadter's "superrationality"?

I'm trying to read this post now but it looks like a bunch of images (of math) are missing. Does that match what others see?

The Bitter Lesson applies to almost all attempts to build additional structure into neural networks, it turns out.

 

Out of curiosity, what are the other exceptions to this besides the obvious one of attention?

Some of your YouTube links are broken because the equals sign got escaped as "%3D". If I were you I'd spend a minute to fix that.

Load More