Referencing recent papers sent my way here (this shall be a live, expanding comment), please do link more if you think they might be useful:
- Inductive biases in theory-based reinforcement learning
This post (which is really dope) provides some grokking examples in large language models in a Big-Bench video at 19313s & 19458s, with that segment (18430s-19650s) being a nice watch! I shall spend a bit more time collecting and precisely identifying evidence and then include it in the grokking part of this post. This was a really nice thing to know about and very suprising.
They're likely to be interchangeable, sorry. Here I might've misused the words to try tease out the difference that simply understanding how a given model works is not really insightful if the patterns are not understandable.
I think there are these nonsensical-seeming-patterns to humans might be a significant fraction of the learned patterns by deep networks. I was trying to understand the radical optimism, in contrast to my pessimism given this. The crux being since we don't know what these patterns are and what they represent, even if we figure out...
Am I correct in thinking the 'ersatz' and 'real' interpretability might differ in aspects more than just degree of interpretability-- Ersatz is somewhat embedded in explaining the typically case, whereas 'real interpretability' gives good reasoning eve in the worst-case. Interpretability might be hard to achieve in worst-case scenarios where some atypical wiring leads to wrong decisions?
Furthermore, I suspect confusing transparency for interpretability. Even if we understand what each-and-every-neuron does (radical transparency), it might not be interpreta...
When you have enough real-world data, you don't need or want to store it because of diminishing returns on retraining compared to grabbing a fresh datapoint from the firehose. (It's worth noting that no one in the large language model space has ever 'used up' all the text available to them in datasets like The Pile, or even done more than 1 epoch over the full dataset they used.) This is also good for users if they don't have to keep around the original dataset to sample maintenance batches from while doing more training.
This would be the main crux, actual...
Hi! Thanks for reading and interesting questions:
I read that first sentence several times and it's still not clear what you mean, or how the footnote helps clarify. What do you mean by 'tweak'? A tweak is a small incremental change.
That's correct, what I meant is say we state an agent has 'x, y, z biases', it can try to correct them. Now, the changes cannot be arbitrary, the constraints are that it has to be competitive and robust. But I think it can reduce the strength of the heuristic by going against it whenever it can to the extent those heuristi...
Hi! Thanks for reading the post carefully and coming up with interesting evidence and arguments against~ I think I can explain PF4, but am certainly wrong on B1.
PF4
Why do you have high confidence that catastrophic forgetting is immune to scaling, given "Effect of scale on catastrophic forgetting in neural networks", Anonymous 2021?
Catastrophic forgetting (mechanism): We train a model to minimize loss on dataset X. Then we train it to minimize loss on dataset Y. When minimizing loss on dataset Y, it has no incentive to care about loss on dataset X. Hence, c...
Hi there! I'm Ameya, currently at the University of Tübingen. I share similar broad interests and am particularly enthusiastic about working on evaluations. Would love to be a part of broader evals group if any created (slack/discord)!
We organized an evals workshop recently! It had a broader focus and wasn't specifically related to AI safety, but it was a great experience -- we are planning to keep running more iterations of it and sharpen focus.