Here are clarifications for a couple minor things I was confused about while reading:
a GPU is way below the power of the human brain. You need something like 100,000 or a million to match it, so we are off by a huge factor here.
I was trying to figure out where LeCun’s 100,000+ claim is coming from, and I found this 2017 article which is paywalled but the subheading implies that he’s focusing on the 10^14 synapses in a human brain, and comparing that to the number of neuron-to-neuron connections that a GPU can handle.
(If so, I strongly disagree with that comparison, but I don’t want to get into that in this little comment.)
Francois Chollet says “The actual information input of the visual system is under 1MB/s”.
I don’t think Chollet is directly responding to LeCun’s point, because Chollet is saying that optical information is compressible to <1MB/s, but LeCun is comparing uncompressed human optical bits to uncompressed LLM text bits. And the text bits are presumably compressible too!
It’s possible that the compressibility (useful information content per bit) of optical information is very different from the compressibility of text information, in which case LeCun’s comparison is misleading, but neither LeCun nor Chollet is making claims one way or the other about that, AFAICT.
Thanks! I added a note about LeCun's 100,000 claim and just dropped the Chollet reference since it was misleading.
His view on AI alignment risk is infuriating simplistic. To just call certain doomsday scenarios objectively "false" is a level of epistemic arrogance that borders on obscene.
I feel like he could at least acknowledge the existence of possible scenarios and express a need to invest in avoiding those scenarios instead of just negating an entire argument.
Good that you mention it and did NOT get down voted. Yet. I have noticed that we are in the midst of an "AI-washing" attack which is also going on here on lesswrong too. But its like asking a star NFL quarterback if he thinks they should ban football because the risk of serious brain injuries, of course he will answer no. The big tech companies pours trillions of dollars into AI so of course they make sure that everyone is "aligned" to their vision and that they will try to remove any and all obstacles when it comes to public opinion. Repeat after me:
"AI will not make humans redundant."
"AI is not an existential risk."
...
I thought the orangutan argument was pretty good when I first saw it, but then I looked it up, and I realised that it is not that they aren't power seeking. It is more that they only are when it comes to interactions that matter for the future survival of offspring. It actually is a very flimsy argument. Some of the things he says are smart like some of the stuff on the architecture front, but you know, he always talks about his aeroplane analogy in AI Safety. It is like really dumb as I wouldn't get into an aeroplane without knowing that it has been safety checked and I have a hard time taking him seriously when it comes to safety as a consequence.
Also you say:
So, as a lower bound we're talking 3-4000 GPUs and as an upper bound 3-4e9. Overall, more uncertainty than LeCun's estimate but in very roughly the same ballpark.
This isn't a lower bound according to Carlsmith as he says:
Overall, I think it more likely than not that 1e15 FLOP/s is enough to perform tasks as well as the human brain
Emphasis mine.
Has Lecun explained anywhere how does he intend to be able to keep the guardrails on open source systems?
Does the median LW commenter believe that autoregressive LLMs will take us all the way to superintelligence?
My sense is almost everyone here expects that we will almost certainly arrive at dangerous capabilities with something else in addition to autoregressive LLMs (at the very least RLHF which is already widely used). I don't know what's true in the limit (like if you throw another 30 OOMs of compute at autoregressive models), and I doubt others have super strong opinions here. To me it seems plausible you get something that does recursive self-improvement out of a large enough autoregressive LLMs, but it seems very unlikely to be the fastest way to get there.
Edit: habryka edited the parent comment to clarify and I now agree. I'm keeping this comment as is for posterity, but note the discussion below.
My sense is almost everyone here expects that we will almost certainly arrive at dangerous capabilities with something else in addition to autoregressive LLMs
This exact statement seems wrong, I'm pretty uncertain here and many other (notable) people seem uncertain too. Maybe I think "pure autoregressive LLMs (where a ton of RL isn't doing that much work) are the first AI with dangerous capabilities" is around 30% likely.
(Assuming dangerous capabilities includes things like massively speeding up AI R&D.)
(Note that in shorter timelines, my probability on pure autoregressive LLMs goes way up. Part of my view on having only 30% on pure LLMs is just downstream of a general view like "it's reasonably likely that this exact approach to making transformatively powerful AI isn't the one that end up working, so AI is reasoanbly likely to look different.)
Some people (e.g. Bogdan Ionut Cirstea) think that it's very likely that pure autoregressive LLMs go all the way through human level R&D etc. (I don't think this is very likely, but possible.)
(TBC, I think qualitatively wildly superhuman AIs which do galaxy brained things that humans can't understand probably requires something more than autoregressive LLMs, at least to be done at all efficiently. And this might be what is intended by "superintelligence" in the original question.)
I was including the current level of RLHF as already not qualifying as "pure autoregressive LLMs". IMO the RLHF is doing a bunch of important work at least at current capability levels (and my guess is also will do some important work at the first dangerous capability levels).
Also, I feel like you forgot the context of the original message, which said "all the way to superintelligence". I was calibrating my "dangerous" threshold to "superintelligence level dangerous" not "speeds up AI R&D" dangerous.
I was including the current level of RLHF as already not qualifying as "pure autoregressive LLMs". IMO the RLHF is doing a bunch of important work at least at current capability levels (and my guess is also will do some important work at the first dangerous capability levels).
Oh, ok, I retract my claim.
Also, I feel like you forgot the context of the original message, which said "all the way to superintelligence".
I didn't, I provided various caveats in parentheticals about the exact level of danger.
I didn't, I provided various caveats in parentheticals about the exact level of danger.
Oops, mea culpa, I skipped your last parenthetical when reading your comment so missed that.
Introduction
Yann LeCun is perhaps the most prominent critic of the “LessWrong view” on AI safety, the only one of the three "godfathers of AI" to not acknowledge the risks of advanced AI. So, when he recently appeared on the Lex Fridman podcast, I listened with the intent to better understand his position. LeCun came across as articulate / thoughtful[1]. Though I don't agree with it all, I found a lot worth sharing.
Most of this post consists of quotes from the transcript, where I’ve bolded the most salient points. There are also a few notes from me as well as a short summary at the end.
Limitations of Autoregressive LLMs
Checking some claims:
Grounding / Embodiment
Richness of Interaction with the Real World
Claim: philosophers are split on grounding. LeCun participated in “Debate: Do Language Models Need Sensory Grounding for Meaning and Understanding?”. Otherwise I’m not so familiar with this debate.
Language / Video and Bandwidth
Hierarchical Planning
Skepticism of Autoregressive LLMs
RL(HF)
Bias / Open Source
Business and Open Source
Safety of (Current) LLMs
I kind of wish Lex had pushed more on why he thinks this will continue in the future.
LLaMAs
GPUs vs the Human Brain
Claims:
What does AGI / AMI Look Like?
I'd love more detail on LeCun's reasons for ruling out recursive self-improvement / hard takeoff[4].
AI Doomers
I think this analogy breaks down in several ways and I wish Lex had pushed back a bit.
What Does a World with AGI Look Like (Especially Re Safety)?
Big Companies and AI
Hope for the Future of Humanity
Summary
According to me, LeCun’s main (cruxy) differences from the median LW commenter:
Once you factor in the belief that transformers won't get us to superintelligence, LeCun's other views start to make more sense. Overall I came away with more uncertainty about the path to transformative AI (If you didn't, what do you know that LeCun doesn't?).
This format lends itself to more nuance than discussions you may have seen on Twitter.
To calculate how long it would take a human to read a 10^13 token LLM training corpus, we need to make some assumptions:
1. Average reading speed: Let's assume an average reading speed of 200 words per minute (wpm). This is a reasonable estimate for an average adult reader.
2. Tokens to words ratio: Tokens in a corpus can include words, subwords, or characters, depending on the tokenization method used. For this estimation, let's assume a 1:1 ratio between tokens and words, although this may vary in practice.
Now, let's calculate:
1. Convert the number of tokens to words:
10^13 tokens = 10,000,000,000,000 words (assuming a 1:1 token to word ratio)
2. Calculate the number of minutes required to read the corpus:
Number of minutes = Number of words ÷ Reading speed (wpm)
= 10,000,000,000,000 ÷ 200
= 50,000,000,000 minutes
3. Convert minutes to years:
Number of years = Number of minutes ÷ (60 minutes/hour × 24 hours/day × 365 days/year)
= 50,000,000,000 ÷ (60 × 24 × 365)
≈ 95,129 years
Therefore, assuming an average reading speed of 200 words per minute and a 1:1 token to word ratio, it would take a single human approximately 95,129 years to read a 10^13 token LLM training corpus. This is an incredibly long time, demonstrating the vast amount of data used to train large language models and the impracticality of a human processing such a large corpus.
https://twitter.com/pmarca/status/1762877683975995416
https://twitter.com/ylecun/status/1651009510285381632
There was unfortunately very little examination of LeCun's beliefs about the feasibility of controlling smarter-than-human intelligence.