I am one of the authors - thank you for taking the time to go through and to summarise our paper!
About your question on the instructions vs inherent abilities:
Consider the scenario where we train a model on the task of Natural Language Inference, using a dataset like The Stanford Natural Language Inference (SNLI) Corpus. Suppose the model performs exceptionally well on this task. While we can now say that the model possesses the computational capability to excel in NLI, this doesn’t necessarily indicate that the model has developed inherent emergent reasoning abilities, especially those that it was not explicitly trained for while being trained on the SNLI corpus. For example, it is unlikely that our NLI-trained model will perform well in tasks that require logical reasoning skills.
My 15 min talk on the paper might also help answer this question: https://www.youtube.com/live/I_38YKWzHR8?si=hWoUr4ucFrT8sFUi&t=3111
Just wanted to share that this work has now been peer-reviewed and accepted to ACL 2024.
arxiv has been updated with the published ACL version: https://arxiv.org/abs/2309.01809
I think this is a very interesting-looking work, and it makes sense to double-check whether its details are correct (if so, it's quite exciting).
However, I sharply disagree with the last sentence of the abstract:
We find no evidence for the emergence of reasoning abilities, thus providing valuable insights into the underlying mechanisms driving the observed abilities and thus alleviating safety concerns regarding their use.
On the contrary, if this paper is correct, this makes the situation even more concerning (that is, if this paper is correct, then one should update towards shorter timelines).
The GPT-3 revolution brought with it two key miracles which were considered impossible before 2020: 1) in-context learning (that is, few-shot learning, but without even requiring the weight updates), and 2) partial competence in generating working computer code.
Speaking in terms of Janus' Simulator theory, in-context learning is, basically, the ability to shape the present simulation run via prompt engineering. If all or most of the emerging properties are attributable to that, this makes them much more accessible (including reasoning via chain-of-thought prompt engineering, etc, etc).
Moreover, if this paper is correct, this increases the likelihood that Janus' conjecture that we have not yet taken nearly full advantage of the capabilities of the existing models is correct. Basically, the correctness of this paper would make it more likely that by creatively playing with prompt engineering one can extract way stronger capabilities from the existing models without much additional training... If the known emerging properties are mostly due to prompt engineering, then it is likely that there are plenty of other emerging properties we can discover by playing with prompt engineering a bit more...
If all this is correct, this does provide stronger control mechanisms, both for us talking to AIs, and for the AIs talking to themselves and to each other...
Hi there,
I am one of the authors - thank you for your interest in this paper.
The focus of the paper is the discussion surrounding the "existential threat" as a result of latent hazardous abilities. Essentially, our results show that there is no evidence to believe that models are likely to have the ability to plan and reason independent of what they are explicitly required to do through their prompts.
Importantly, as mentioned in the paper, there remain other concerns regarding the use of LLMs: For example, the ease with which they can be used to generate fake news or spam emails.
You are right that our results show that "emergent abilities" are dependant on prompts, however, our results also imply that tasks which can be solved by models are not really "emergent" and this will remain the case for any new tasks we find they are able to solve.
Here's a summary of the paper: https://h-tayyarmadabushi.github.io/Emergent_Abilities_and_in-Context_Learning/
Yes, thanks a lot!
Yes, I don't think they'll "wake up on their own", we would need to do something for that.
But they are not active on their own anyway, they only start computing when one prompts them; so whatever abilities they do exhibit become apparent via our various interactions with them.
In some sense, the ability to do rapid in-context learning is their main emergent ability (although, we have recently seen some indications that scale might not be required for that at all, that with a more clever architecture and training, one can obtain these capabilities in radically smaller models, see e.g. Uncovering mesa-optimization algorithms in Transformers which seems to point in that direction rather strongly).
Thanks!
Yes, I completely agree with you that in-context learning (ICL) is the only new "ability" LLMs seem to be displaying. I also agree with you that they start computing only when we prompt.
There seems to be the impression that, when prompted, LLMS might do something different (or even orthogonal) to what the user requests (see, for example, Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure, report here by the BBC). We'd probably agree that this was careful prompt engineering (made possible by ICL) and not an active attempt by GPT to "deceive".
Just so we can explicitly say that this isn't possible, I'd not call ICL an "emergent ability" in the Wei et al. sense. ICL "expressiveness" seems to increase with scale so it's predictable (and so does not imply other "unknowable" capabilities emerging with scale such as, deception, planning, ...)!
It's going to be really exciting if we are able to obtain ICL at smaller scale! Thank you very much for that link. That's a very interesting paper!
Yes, I agree with that.
It's very interesting that even quite recently ICL was considered to be "an impossible Holy Grail goal", with the assumption that models always need to see many examples to learn anything new (and so they are inherently inferior to biological learners in this sense).
And now we have this strong ICL ability and this gap with biological learners is gone...
A new preprint, Lu et al., published last month, has important implications for AI risk if true. It essentially suggests that emergent capabilities of current LLMs are (with some exceptions) mediated by a model's ability to do in-context learning. If so, emergent capabilities may be less unexpected, given that improvements to in-context learning (e.g., via instruction tuning) are relatively predictable.
Wei et al. defines "emergent ability" like so: "An ability is emergent if it is not present in smaller models but is present in larger models." So you can have sudden "jumps" in performance on various tasks as you scale models up, e.g. (from Wei et al.):
Lu et al. is saying, yes, but that is not because the larger models learned to do those specific tasks better, or learned to become better generic reasoners. It is because the larger models learned to follow instructions better, partly from scaling up, and partly being trained on data sets of instructions and instruction-following ("instruction tuning").
The methodology: "We conduct rigorous tests on a set of 18 models, encompassing a parameter range from 60 million to 175 billion parameters, across a comprehensive set of 22 tasks. Through an extensive series of over 1,000 experiments, we provide compelling evidence that emergent abilities can primarily be ascribed to in-context learning."
The paper finds that, when accounting for the "biasing factors", only 2 of 14 previously "emergent" tasks actually show emergence. The blue and yellow lines are instruction-tuned models evaluated via few-shot prompting, showing emergence. The purple and green lines are non-instruction-tuned models evaluated via zero-shot prompting, generally showing no emergence.
They write:
They also argue that this explains partly why RLHF will never fully align a model. RLHF shows a model examples of when and when not to give a certain response, and essentially just makes the model follow instructions better (by giving it a long list of things not to say), without changing the model's inherent way of behaving with regard to some given task. I'm not sure I fully understand this argument, as I am somewhat confused about the instructions/inherently distinction.
(NB: I am not an ML engineer, so my summary and analysis above may be flawed in some ways, and I am not capable of evaluating whether the methodology/findings are sound. I am sharing this here mostly because I'm curious to get LW users' takes on it.)