To save people the click to Lecun's twitter, I'll gather what pieces I can from Lecun's recent twitter posts:
My claim is that AI alignment will be manageable & less difficult than many have claimed. But until we have a design for Human-Level AI, it's mere speculation. link
He does seem to believe that there is in fact a problem named "AI alignment" that has to be solved for human-level AIs, it's just that he believes it will be much more manageable than AI-notkilleveryoneism people expect.
And from the responses to Julian Togelius' recent blog post:
Julian reminds us:
1. all intelligence is specialized, including human intelligence.
2. being smart in some domains makes you strong in some environments but weak in others.
3. Intelligence does not immediately cause a thing to be able to "take over"
4. Intelligence does not immediately cause an entity yo want to "take over"
5. A very dumb but specialized entity can kill a smarter one, e.g. virus vs human. link
So from points 1 and 2 it looks like he fundamentally disagrees with Eliezer's gesturing at a "core of generality" that develops once you optimize deeply enough. He doesn't expect a system to suddenly "get it" and see the deep regularities that underlie most problems, in fact I think he doesn't think there are such regularities.
Point 3 seems to be about the difficulty of box-escapes and dominating all of humanity. I'm reading that as disagreeing with the general jump that lesswrong types usually do to go from "human-level AI" to "strictly stronger than all of humanity combined".
Point 4 seems like a disagreement with instrumental convergence.
Point 5 is the only really interesting one imo, and I think it's a very good point. Current image recognition models at all ability levels are vulnerable to adversarial attacks which make them unable to recognise images with imperceptible changes as the proper category. I don't know if LLMs have something similar, but I think it's very likely the case, and I wouldn't expect the adversarial attacks to stop working on the most advanced models. So we do have examples of very dumb but targeted methods of defeating very general models, which implies that we might actually have a chance against a superintelligence if we've developed targeted weapons before it gets loose.
Many AI safety discussions today seem as speculative as discussions about airliner safety in 1890. Before we have a basic design & basic demos of AI systems that could credibly reach human-level intelligence, arguments about their risks & safety mechanisms are premature.
So he's not impressed by GPT4, and apparently doesn't think that LLMs in general have a shot at credibly reaching human-level.
Every new technology is developed and deployed the same way: You make a prototype, try it at a small scale, make limited deployment, fix the problems, make it safer, and then deploy it more widely. At that point, governments regulate it and establish safety standards.
He expects AI safety to not be fundamentally different from any other engineering domain, and seems to disagree that we'll only have a single shot at aligning a superintelligence.
After some more scouring of his twitter page, I actually found an argument for pessimism of LLMs that I agree with !!! (hallelujah)
I have claimed that Auto-Regressive LLMs are exponentially diverging diffusion processes. Here is the argument: Let e be the probability that any generated token exits the tree of "correct" answers. Then the probability that an answer of length n is correct is (1-e)^n
Errors accumulate. The proba of correctness decreases exponentially. One can mitigate the problem by making e smaller (through training) but one simply cannot eliminate the problem entirely. A solution would require to make LLMs non auto-regressive while preserving their fluency.
This seems to be related to the "curse of behaviour cloning". Learning to behave correctly only from a dataset of correct behaviour doesn't work, you need examples in your dataset of how to correct wrong behaviour. As an example, if you try to make chatGPT play chess, at some point it will make a nonsensical move, or if you make it do chess analysis it will mistakenly claim that something happened, and thereafter it will condition its output on that wrong claim! It doesn't go "ah yes 2 prompts ...
Point 3 seems to be about the difficulty of box-escapes and dominating all of humanity. I'm reading that as disagreeing with the general jump that lesswrong types usually do to go from "human-level AI" to "strictly stronger than all of humanity combined".
The problem is that I think I might agree that "slightly smarter-than-human AGI in a box managed by trained humans" really doesn't have as easy a way out as EY might think. But that's also not what's going to happen if things are only left to complete "don't sweat it" techno-optimists. What's going to h...
Yann LeCun has been saying a lot of things on social media recently about this topic, only some of which I've read. He's also written and talked about it several times in the past. Most of what I've seen from him recently seems to not be addressing any of the actual arguments, but on the other hand I know he's discussed this in many forums over several years, and he's had the arguments spelled out to him so many times by so many people that it's hard for me to believe he really doesn't know what the substantive arguments are. Can someone who's read more of Yann's arguments on this please give their best understanding of what he's actually arguing, in a way that will be understandable to people who are familiar with the standard x-risk arguments?