Edit: TLDR: EY focuses on the clearest and IMO most important part of his argument:
- Before building an entity smarter than you, you should probably be really sure its goals align with yours.
- Humans are historically really bad at being really sure of anything nontrivial on the first real try.
I found this interview notable as the most useful public statement yet of Yudkowsky's views. I congratulate both him and the host, Dan Fagella, for strategically improving how they're communicating their ideas.
Dan is to be commended for asking the right questions and taking the right tone to get a concise statement of Yudkowsky's views on what we might do to survive, and why. It also seemed likely that Yudkowsky has thought hard about his messaging after having his views both deliberately and accidentally misunderstood and panned. Despite having followed his thinking over the last 20 years, I gained new perspective on his current thinking from this interview.
Takeaways:
- Humans will probably fail to align the first takeover-capable AGI and all die
- Not because alignment is impossible
- But because humans are empirically foolish
- And historically rarely get hard projects right on the first real try
- Here he distinguishes first real try from getting some practice -
- Metaphor: launching a space probe vs. testing components
- Here he distinguishes first real try from getting some practice -
- Therefore, we should not build general AI
- This ban could be enforced by international treaties
- And monitoring the use of GPUs, which would legally all be run in data centers
- Yudkowsky emphasizes that governance is not within his expertise.
- We can probably get away with building some narrow tool AI to improve life
- This ban could be enforced by international treaties
- Then maybe we should enhance human intelligence before trying to build aligned AGI
- Key enhancement level: get smart enough to quit being overoptimistic about stuff working
- History is just rife with people being surprised their projects and approaches don't work
- Key enhancement level: get smart enough to quit being overoptimistic about stuff working
I find myself very much agreeing with his focus on human cognitive limitations and our poor historical record of getting new projects right on the first try. I researched cognitive biases as the focus of my neuroscience research for some years, and came to the conclusion that wow, humans have both major cognitive limitations (we can't really take in and weigh all the relevant data for complex questions like alignment) and have major biases, notably a sort of inevitable tendency to believe what seems like it will benefit us, rather than what's empirically most likely to be true. I still want to do a full post on this, but in the meantime I've written a mid-size question answer on Motivated reasoning/ confirmation bias as the most important cognitive bias.
My position to date has been that, despite those limitations, aligning a scaffolded language model agent (our most likely first form of AGI) to follow instructions is so easy that a monkey(-based human organization) could do it.
After increased engagement on these ideas, I'm worried that it may be my own cognitive limitations and biases that have led me to believe that. I now find myself thoroughly uncertain (while still thinking those routes to alignment have substantial advantages over other proposals).
And yet, I still think the societal rush toward creating general intelligence is so large that working on ways to align the type of AGI we're most likely to get is a likelier route to success than attempting to halt that rush.
But the two could possibly work in parallel.
I notice that fully general AI is not only the sort that is most likely to kill us, but also the type that is more obviously likely to put us all out of work, uncomfortably quickly. By fully general, I mean capable of learning to do arbitrary new tasks. Arbitrary tasks would include any particular job, and how to take over the world.
This confluence of problems might be a route to convincing people that we should slow the rush toward AGI.
I have an argument for halting AGI progress based on an analogy to the Covid-19 pandemic. Initially the government response to the pandemic was widespread lockdowns. This is a rational response given that at first, given a lack of testing infrastructure and so on, it wasn't possible to determine whether someone had Covid-19 or not so the safest option was to just avoid all contact with all other people via lockdowns.
Eventually we figured out practices like testing and contact tracing and then infected individuals could self-isolate if they came into contact with an infected individual. This approach is smarter and less costly than blanket lockdowns.
In my opinion, regarding AGI, the state we are in is similar to the beginning of the Covid-19 pandemic where there is a lot of uncertainty regarding the risks and capabilities of AI and which alignment techniques would be useful so a rational response would be the equivalent of an 'AI lockdown' by halting progress on AI until we understand it better and can come up with better alignment techniques.
The most obvious rebuttal to this argument is that pausing AGI progress would have a high economic opportunity cost (no AGI). But Covid lockdowns did too and society was willing to pay a large economic price to avert Covid deaths.
The economic opportunity cost of pausing AGI progress might be larger than the covid lockdowns but the benefits would be larger too: averting existential risk from AGI is a much larger benefit than avoiding covid deaths.
So in summary I think the benefits of pausing AGI progress outweigh the costs.
Several years? I don't think we have that long. I'm thinking mid to late 2026 for when we hit AGI. I think 1,2,3 can change very quickly indeed, like with the covid lockdowns. People went from 'doubt' to 'doing' in a short amount of time, once evidence was overwhelmingly clear.
So having 4 in place at the time that occurs seems key. Also, trying to have plans in place for adequately convincing demos which may convince people before disaster strikes seems highly useful.