All of Felix Hofstätter's Comments + Replies

What would it look like if such a model produces code or, more generally, uses any skill that entails using a domain specific language? I guess in the case of programming even keywords like "if" could be translated into Sumerian, but I can imagine that there are tasks that you cannot obfuscate this way. For example, the model might do math by outputting only strings of mathematical notation.  

Also, it seems likely that frontier models will all be multi-modal, so they will have other forms of communication that don't require language anyway. I suppose ... (read more)

While reading the the post and then some of the discussion I got confused about if it makes sense to distinguish between That-Which-Predicts and the mask in your model. 

Usually I understand the mask to be what you get after fine-tuning - a simulator whose distribution over text is shaped like what we would expect from some character like the the well-aligned chat-bot whose replies are honest, helpful, and harmless (HHH). This stand in contrast to the "Shoggoth" which is the pretrained model without any fine-tuning. It's still a simulator but with a di... (read more)

1simon
Yes, clearly I'm using the mask metaphor differently than a lot of people, and maybe I should have used another term (I guess simulated/simulator? but I'm not distinguishing between being actually a simulated entity or just functionally analogous).

Thank you for this post. It looks like the people at Anthropic have put a lot of thought into this which is good to see.

You mention that there are often surprising qualitative differences between larger and smaller models. How seriously is Anthropic considering a scenario where there is a sudden jump in certain dangerous capabilities (in particular deception) at some level of model intelligence? Does it seem plausible that it might not be possible to foresee this jump from experiments on even slighter weaker models? 

We certainly think that abrupt changes of safety properties are very possible! See discussion of how the most pessimistic scenarios may seem optimistic until very powerful systems are created in this post, and also our paper on Predictability and Surprise.

With that said, I think we tend to expect a bit of continuity. Empirically, even the "abrupt changes" we observe with respect to model size tend to take place over order-of-magnitude changes in compute. (There are examples of things like the formation of induction heads where qualitative changes in model ... (read more)

Very interesting, after reading chinchilla's wild implications I was hoping someone would write something like this!

If I understand point 6 correctly, then you are proposing that Hoffman's scaling laws lead to shorter timelines because data-efficiency can be improved algorithmically. To me it seems that it might just as well make timelines longer to depend on algorithmic innovations as opposed to the improvements in compute that would help increase parameters. It feels like there is more uncertainty about if people will keep coming up with the novel ideas ... (read more)

4Cleo Nardo
I'll give you an analogy: Suppose your friend is running a marathon. You hear that at the halfway point she has a time of 1 hour 30 minutes. You think "okay I estimate she'll finish the race in 4 hours". Now you hear she has been running with her shoelaces untied. Should you increase or decrease your estimate? Well, decrease. The time of 1:30 is more impressive if you learn her shoelaces were untied! It's plausible your friend will notice and tie up her shoelaces. But note that if you didn't condition on the 1:30 information, then your estimate would increase if you learned her shoelaces were untied for the first half. Now for Large Language Models: Believing Kaplan's scaling laws, we figure that the performance of LLMs depended on N the number of parameters. But maybe there's no room for improvement in N-efficiency. LLMs aren't much more N-inefficient than the human brain, which is our only reference-point for general intelligence. So we expect little algorithmic innovation. LLMs will only improve because N and D grows. On the other hand, believing Hoffman's scaling laws, we figure that the performance of LLMs depended on D the number of datapoints. But there is likely room for improvement in D-efficiency. The brain is far more D-inefficient than LLMs. So LLMs have been metaphorically running with their shoes untied. There is room for improvement. So we're less surprised by algorithmic innovation. LLMs will still improve because N and D grows, but this isn't the only path. So Hoffman's scaling laws shorten our timeline estimates. This is an important observation to grok. If you're already impressed by how an algorithm performs, and you learn that the algorithm has a flaw which would disadvantage it, then you should increase your estimate of future performance.