Comment Permalink

Thanks!

It has been explored (multiple times even on this site), and doesn't avoid doom. It does close off some specific paths that might otherwise lead to doom, but not all or even most of them.

Do you have any specific posts in mind?

To be clear, I'm not suggesting that because of this possibility we can just hope that this is how it plays out and we will get lucky.

If we could find a hard limit like this, it seems like it would make the problem more tractable, however. It doesn't have to exist simply because we want it to exist. Searching for it still seems like a good idea.

There's a hundred problems to solve, but it seems like it could avoid the main bad scenario at least: that of AI rapidly self-improving. Improving its hardware wouldn't be trivial for a human-level AI, and it wouldn't have options present in other scenarios. And scaling beyond a single machine seems likely to be a significant barrier at least.

It could still create millions of copies of itself. That's still a problem, but also still a better problem to have than a single AI with no coordination overhead.

See in context

3

[ Question ]

If alignment problem was unsolvable, would that avoid doom?

by Kinrany

7th May 2023

1 min read

1 3

3

Suppose there is a useful formulation of the alignment problem that is mathematically unsolvable. Suppose that as a corollary, modifying your own mind while ensuring any non-trivial property of the resulting mind was also impossible.

Would that prevent a new AI from trying to modify itself?

Has this direction been explored before?

Frontpage

3

If alignment problem was unsolvable, would that avoid doom?

3JBlack

1Kinrany

New Answer

New Comment

1 Answers sorted by
top scoring

JBlack

May 08, 2023

It has been explored (multiple times even on this site), and doesn't avoid doom. It does close off some specific paths that might otherwise lead to doom, but not all or even most of them.

Some remaining problems:

AI may be perfectly well capable of killing everyone without self-improvement;
An AI may be capable of some large self-improvement step, but not aware of this theorem;
Self-improving AI's might not care about whether the result is aligned with their former self, and indeed may not even have any goals at all before self-improvement;
AIs may create smarter AIs without improving their own capabilities, knowing that the result won't be fully aligned but expecting that they can nevertheless keep the result under control (and they were wrong);
In a population with many AIs, those that don't self-improve may be out-competed by those that do - leading to selection for AIs that self-improve regardless of consequences;
It is extremely unlikely that a mere change of computing substrate would meet the conditions of such a theorem, so an AI can almost certainly upgrade its hardware (possibly by many orders of magnitude) to run faster without modifying its mind in any fundamental way.

At this point my 5-minute timer on "think up ways things can still go wrong" ran out, and I just threw out the dumbest ideas and listed the rest. I'm sure with more thought other objections could be found.

[-]Kinrany2y10

Thanks!

It has been explored (multiple times even on this site), and doesn't avoid doom. It does close off some specific paths that might otherwise lead to doom, but not all or even most of them.

Do you have any specific posts in mind?

To be clear, I'm not suggesting that because of this possibility we can just hope that this is how it plays out and we will get lucky.

1Kinrany2y

The problem of creating a strong AI and surviving, that is. We'd still get Hanson's billions of self-directed EMs.

Moderation Log

3

[ Question ]

If alignment problem was unsolvable, would that avoid doom?

3

3

1 Answers sorted by top scoring

May 08, 2023

1 Answers sorted by
top scoring