It has been explored (multiple times even on this site), and doesn't avoid doom. It does close off some specific paths that might otherwise lead to doom, but not all or even most of them.
Some remaining problems:
At this point my 5-minute timer on "think up ways things can still go wrong" ran out, and I just threw out the dumbest ideas and listed the rest. I'm sure with more thought other objections could be found.
Thanks!
It has been explored (multiple times even on this site), and doesn't avoid doom. It does close off some specific paths that might otherwise lead to doom, but not all or even most of them.
Do you have any specific posts in mind?
To be clear, I'm not suggesting that because of this possibility we can just hope that this is how it plays out and we will get lucky.
If we could find a hard limit like this, it seems like it would make the problem more tractable, however. It doesn't have to exist simply because we want it to exist. Searching for it still s...
Suppose there is a useful formulation of the alignment problem that is mathematically unsolvable. Suppose that as a corollary, modifying your own mind while ensuring any non-trivial property of the resulting mind was also impossible.
Would that prevent a new AI from trying to modify itself?
Has this direction been explored before?