bentarm writes:
I'm just echoing everyone else here, but I don't understand why the AI would do anything at all other than just immediately find the INT_MAX utility and halt - you can't put intermediate problems with some positive utility because the AI is smarter than you and will immediately devote all its energy to finding INT_MAX.
Now, this is in response to a proposed AI who gets maximum utility when inside its box. Such an AI would effectively be a utility junkie, unable to abandon its addiction and, consequently, unable to do much of anything.
(EDIT: this is a misunderstanding of the original idea by jimrandomh. See comment here.)
However, doesn't the same argument apply to any AI? Under the supposition that it would be able to modify its own source code, the quickest and easiest way to maximize utility would be to simply set its utility function to infinity (or whatever the maximum is) and then halt. Are there ways around this? It seems to me that any AI will need to be divided against itself if it's ever going to get anything done, but maybe I'm missing something?
The "hide an INT_MAX bonus inside the box" idea was mine. One key detail missing from this description of it: the AI does not know that the bonus exists, or how to get it, unless it has already escaped the box; and escaping is meant to be impossible. So it's a failsafe - if everything works as designed it might as well not exist, but if something goes wrong it makes the AI effectively wirehead itself and then halt.
I am aware of two problems with this plan which no one has suggested yet.
Sorry, my mistake. I've added a note to the original post.