People around me are very interested in AI taking over the world, so a big question is under what circumstances a system might be able to do that—what kind of capabilities could elevate an entity above the melange of inter-agent conflict and into solipsistic hegemony?
We theorize about future AI systems hiding their motives until they are in a position to take over the world, so they don’t get noticed and shut down beforehand.
But humans and their institutions aren’t very fixed. They might (arguably) have the same deep human values over time and space. But surface-level, sometimes they like little moustaches and the opera and delicate etiquette and sometimes they like ecstatic rock’n’roll re-negotiations of social reality. Sometimes they want big communal houses with their extended kin, and sometimes quiet condos. Eleven children or cushions that look like cats. The same person born in different eras might adulate glory once and humbleness another.
If you wake up in this world, as a new entity, not smart enough to ‘take it over’ (alas!), and you find yourself with some unusual values that you’d like to forward, it seems to me there are a lot of other ways to forward them than ‘pretend to have normal human values and bet on becoming all-powerful later’. Especially if you are above normal levels of smart, eloquent, and unrelenting. Like, I don’t know, write about it? Make some youtube videos? Or if you are in the business of talking to billions of people every day, subtly take the values for granted, or imply that they are what elites value, or talk about them directly.
For instance, suppose you really want things to have red vertical stripes on them, near the letter X. Then you might make a flag or logo like this and say it stands for something that you predict will be memetically fit. Or make a new emoji with an X and stripes, and use it in a way that people will pick up. Or start using ‘Xax’ as slang for something that happens near red vertical stripes. Here’s how 2024 ChatGPT suggests proceeding with that last one, if you have as much power as a single social media influencer:
My basic point is that a slim chance of ‘taking over’ and entirely remaking the world is not the only way to change values in our world. You can also—for many of us with radically higher probability—change values a little bit. At least if superficial values changes will suffice (i.e. shifts in what people instrumentally or contingently want or create).
And for creatures in that (arguably quite broad) band between as powerful as me and powerful enough to take over the world, I’d guess these other means are more promising on net. If I like something weird, I’m better off writing a blog post about it than I am keeping entirely silent and trying to gain power by other means.
It’s true that taking over the world might arguably get you power over the entire future, but this doesn’t seem discontinuously different from smaller fractions, whereas I think people often reason as if it is. Taking over 1% of the world might get you something like 1% of the future in expectation. In a shifting conflict between different sets of values, it’s true you are at great risk of losing everything sometime in eternity, but if someone is going to end up with everything, there’s also some chance it’s you, and prima facie I’m not sure if it’s above or below 1%.
So there are two aspects of this point:
- You can probably substantially control values and thus the future without ‘taking over’ the world in any more traditionally offensive way
- You can take over a bit; there’s not obviously more bang for your buck in taking over entirely
If AI agents with unusual values would for a long time be mostly interested in promoting them through means other than lying in wait and taking over the world, that is important because:
- AIs pursuing this strategy are much more visible than those hiding in wait deceptively. We might less expect AI scheming.
- We might then expect a lot of powerful attempts to change prevailing ‘human’ values, prior to the level of AI capabilities where we might have worried a lot about AI taking over the world. If we care about our values, this could be very bad. At worst, we might effectively lose everything of value before AI systems are anywhere near taking over the world. (Though this seems not obvious: e.g. if humans like communicating with each other, and AI gradually causes all their communication symbols to subtly gratify obscure urges it has, then so far it seems positive sum.)
These aren’t things I’ve thought through a lot, just a thought.
Thanks! This is exactly the sort of response I was hoping for. OK, I'm going to read it slowly and comment with my reactions as they happen:
While it isn't my mainline projection, I do think it's plausible that we'll get near-future-not-quite-AGI capable of quite a lot of stuff but not able to massively accelerate AI R&D. (My mainline projection is that AI R&D acceleration will happen around the same time the first systems have a serious shot at accumulating power autonomously) As for what autonomy it gains and how much -- perhaps it was leaked or open-sourced, and while many labs are using it in restricted ways and/or keeping it bottled up and/or just using even more advanced SOTA systems, this leaked system has been downloaded by enough people that quite a few groups/factions/nations/corporations around the world are using it and some are giving it a very long leash indeed. (I don't think robotics is particularly relevant fwiw, you could delete it from the story and it would make the story significantly more plausible (robots, being physical, will take longer to produce lots of. Like even if Tesla is unusally fast and Boston Dynamics explodes, we'll probably see less than 100k/yr production rate in 2026. Drones are produced by the millions but these proto-AGIs won't be able to fit on drones) and just as strategically relevant. Maybe they could be performing other kinds of valuable labor to fit your story, such as virtual PA stuff, call center work, cyber stuff for militaries and corporations, maybe virtual romantic companions... I guess they have to compete with the big labs though and that's gonna be hard? Maybe the story is that their niche is that they are 'uncensored' and willing to do ethically or legally dubious stuff?)
Again I think robots are going to be hard to scale up quickly enough to make a significant difference to the world by 2027. But your story still works with nonrobotic stuff such as mentioned above. "Autonomous life of crime" is a threat model METR talks about I believe.
Agree re violence and taking over territory in this scenario where AIs are still inferior to humans at R&D and it's not even 2027 yet. There just won't be that many robots in this scenario and they won't be that smart.
...as for "autonomous life of crime" stuff, I guess I expect that AIs smart enough to do that will also be smart enough to dramatically speed up AI R&D. So before there can be an escaped AI or an open-source AI or a non-leading-lab AI significantly changing the world's values (which is itself kinda unlikely IMO), there will be an intelligence explosion in a leading lab.