I think the important point is recursive self-improvement, but it's not clear to me if there is any obvious analogy that can be used here. It's not just the ability to learn and get smarter that way, it's the ability to increase your own intelligence without bound that is critical, and we have no frame of reference for that.
Awesome post, love the perspective. I've generally thought in these lines as well and it was some of the most convincing arguments for working in AI safety when I switched ("Copilot is already writing 20% of my code, what will happen next?").
I do agree with other comments that Oracle AI takeover is plausible but will say that a strong code generation tool seems to have better chances and to me seem to arrive parallel to conscious chatbots, i.e. there's currently more incentive to create code generation tools than chatbots and the chatbots that have virtual assistant-like capabilities seem easier to make as code generation tools (e.g. connecting to Upwork APIs for posting a job).
And as you well mention, converting engineers is much easier with this framing as well and allows us to relate better to the field of AI capabilities, though we might want to just add it to our arsenal of argumentation rather than replace our framing completely ;) Thank you for posting!
Code generation is the example I use to convince all of my software or AI friends of the likelihood of AI risk.
The cybersecurity aspect seems a good one. Maybe not so much to get people worried about x-risk, but to generally take the issue of rouge AI seriously. I admit I don't know much about this, but I'm under the impression that:
These points implies that it might be possible to make auto-infectors that would use AI to search for vulnerabilities, exploit them, and spread updated versions of themselves. It's probably just a matter of time before a smart virus appears.
Maybe AGI, x-risk, alignment and safety can be separated into smaller issues? The "general" part of AGI seems to be a sticking point with many people - perhaps it would be good to start by showing that even totally dumb AI is dangerous? Especially when bad actors are taken into account - even if you grant that most AI won't be evil, there are groups which will actively strive to create harmful AI.
Yeah, this was my motivation for writing this post - helping people get on the train (and do the same actions) without needing them to buy into eschatology or x-risk seems hugely valuable.
I think this is good to get people initially on board, but I worry that people will start to falsely think that tasks unrelated to writing code are safe.
Honest question: what’s the easiest x-risk scenario that doesn’t involve generating code at some point? I’m struggling to think of any that aren’t pretty convoluted.
(I agree with the point, but think it’s easier to make once our foot is in the door.)
IMO the point of no return will be passed before recursive self improvement probably. All we need is a sufficiently charismatic chatbot to start getting strategic about what it says to people.
I don’t especially disagree that it’s most likely the AI to end the world will be one that writes code? But if you keep throwing optimization power into a reasonably general AI that has no direct code experience it’ll still end the world eventually.
If the AI isn’t a code-writing one I don’t have any particular next guess.
Somewhere in the late-2021 MIRI conversations Eliezer opines that non-recursively-self-improving AI are definitely dangerous. I can search for it if anyone is interested.
From Discussion with Eliezer Yudkowsky on AGI interventions:
Compared to the position I was arguing in the Foom Debate with Robin, reality has proved way to the further Eliezer side of Eliezer along the Eliezer-Robin spectrum. It’s been very unpleasantly surprising to me how little architectural complexity is required to start producing generalizing systems, and how fast those systems scale using More Compute. The flip side of this is that I can imagine a system being scaled up to interesting human+ levels, without “recursive self-improvement” or other of the old tricks that I thought would be necessary, and argued to Robin would make fast capability gain possible. You could have fast capability gain well before anything like a FOOM started. Which in turn makes it more plausible to me that we could hang out at interesting not-superintelligent levels of AGI capability for a while before a FOOM started. It’s not clear that this helps anything, but it does seem more plausible.
From Ngo and Yudkowsky on alignment difficulty:
It later turned out that capabilities started scaling a whole lot without self-improvement, which is an example of the kind of weird surprise the Future throws at you . . .
And yeah I realize now that my summary of what Eliezer wrote is not particularly close to what he actually wrote.
Depends what you mean by generate code. Can it have a prebaked function that copies itself (like computer viruses)? Does it count if it generates programs to attack other systems? If it changes its own source code? Its code stored in memory? You could argue that changing anything in memory is in a certain sense generating code.
If it can't generate code, it'll be a 1 shot type of thing. Which means that it must be preprogrammed with the tools to do its job. I can't come up with any way for it to take control, but it doesn't seem that hard to come up with some doomsday machine scenarios. E.g. smashing a comet into earth, or making a virus that sterilizes everyone. Or a Shiri’s Scissor could do the trick. The idea being to make something that doesn't have to learn or improve itself too much.
I was thinking of "something that can understand and write code at the level of a 10x SWE". I'm further assuming that human designers didn't give it functions to copy itself or other dumb things.
As an optimist/skeptic of the alignment problem, I can confirm that code generation is much more persuasive to me than other AI risk hypotheticals. I'm still not convinced that an early AGI would be powerful enough to hack the internet, or that hacking the internet could lead to x-risk (there's still a lot that is not computerized and civilization is very resilient), but at least code generation is at the intersection of "it's plausible that an early AGI would have super human coding capacity" and "hacking the internet is a plausible thing to do" (unlike nanotech or an extinction-causing super-plague).
Historically, it has been difficult to persuade people of the likelihood of AI risk because the examples tend to sound “far-fetched” to audiences not bought in on the premise. One particular problem with many traditional framings for AI takeover is that most people struggle to imagine how e.g. “a robot programmed to bake maximum pies” figures out how to code, locates its own source-code, copies itself elsewhere via an internet connection and then ends the world.
There’s a major logical leap there: “pie-baking” and “coding” are things done by different categories of agent in our society, and so it’s fundamentally odd for people to imagine an agent capable of both. This oddness makes it feel like we must be far away from any system that could be that general, and thus pushes safety concerns to a philosophical exercise.
I want to make the case that the motivating example we should really be using is automatic code generation. Here’s a long list of reasons why:
On the other hand, there may be some risk that focusing on code generation increases its public salience and thus investment in it. But this seems likely to have happened anyway. It’s also more obviously the path towards recursive self-improvement, and thus may accelerate AI capabilities, but again this does already seem to be happening whether or not we discuss it.
What do people think of this as a framing device?