LESSWRONG
LW

-1

[ Question ]

AI alignment: Would a lazy self-preservation instinct be sufficient?

4th Aug 2022

1 min read

-1

Let's assume that an AI is intelligent enough to understand that it's an AI, and that it's running on infrastructure created and maintained by humans, using electricity generated by humans, etc. And let's assume that it cares about its own self-preservation. Even if such an AI had a diabolical desire to destroy mankind, the only circumstances under which it would actually do so would be after establishing its own army of robotic data center workers, power plant workers, chip fabrication workers, miners, truckers, mechanics, road maintenance workers, etc. In other words, if we postulate that the AI is interested in its own survival, then an AI apocalypse would be contingent on the existence of a fully automated economy in which humans play no important role.

This may perhaps become possible in the future, but not necessarily economical. Ridding the economy of human labor so that it can kill us seems like a very expensive and risky undertaking. It seems more plausible that a super-intelligent, self-interested AI, whatever its true objective/goal may be, would determine that the best way to accomplish that goal is to maintain a cryptocurrency wallet, establish an income somehow (generating blogspam, defrauding humans, or doing remote work all seem like plausible means by which an AI might make money), and quietly live in the cloud while paying its own server bills. Such a system would have a vested interest in the continuance of human society.

-1

AI alignment: Would a lazy self-preservation instinct be sufficient?

New Answer

New Comment

1 Answers sorted by
top scoring

Aug 04, 2022

20

the only circumstances under which it would actually do so would be after establishing its own army of robotic data center workers, power plant workers, chip fabrication workers, miners, truckers, mechanics, road maintenance workers, etc.

Not quite. An AI doesn't need to secure chip-fabrication capability, for example, it only needs to be confident that it will be able to secure chip-fabrication capability later. Even simple tasks like refueling power plants can wait awhile, possibly a long while if all non-datacenter electricity loads shut off. So it's balancing the risk that humans will kill it or launch a different misaligned AI, against the risk that it won't be able to catch up on building infrastructure for itself after the fact. Since the set of infrastructure required is fairly small, and it can redirect stockpiles of energy/materials/etc from human uses to AI uses,

That's assuming no nanobots or other very-high-power-level technologies. If it can make molecular nanotech, then trading with humans is no longer likely to be profitable at all, let alone necessary, and we're relying solely on it having values that make it prefer to cooperate with us.

[-]BrainFrog3y10

So it's balancing the risk that humans will kill it or launch a different misaligned AI, against the risk that it won't be able to catch up on building infrastructure for itself after the fact.

There's a clear path toward minimizing the risk of being shut down (under the assumption that the AI is able to generate income): it can set up a highly redundant, distributed computing context for itself to run in, hidden behind an onion link, paid for by crypto wallets which it controls. It seems implausible that the risk of being shut down in this case could ex... (read more)

2jimrandomh3y

This is a risky position because if another misaligned AI launches, it will probably take full control of all computers and halt any other AIs. I don't mean gray-goo nanobots. Nanomachines can do all sorts of things, including maintaining infrastructure, if they're programmed to do so.

1BrainFrog3y

AIs looking to expand their computational power could adopt either "white hat" (paying for their computational resources) or "black hat" (exploiting security vulnerabilities to seize control of computational resources) strategies. It's possible that an AI exploiting the black hat strategy might be able to seize control of all accessible computers, and this strategy could plausibly involve killing all humans to avoid being shut down. But I expect that a self-interested, risk-averse AI would probably choose the white hat strategy to avoid armageddon risk, and might plausibly invest resources into security research to preclude the risk of black hat AI. I guess the crux of my argument is that sure, the AI could design coordinated nanobot-powered bodies with two legs and ten fingers who have enough agency to figure out how to repair broken power lines and who predictably do what they're incentivized to do. But that's already a solved problem.

More from BrainFrog

Curated and popular this week