It seem to me possible to create a safe oracle AI.
Suppose that you have a sequence predictor which is a good approximation of Solomonoff induction but which run in reasonable time. This sequence predictor can potentially be really useful (for example, predict future siai publications from past siai publications then proceed to read the article which give a complete account of Friendliness theory...) and is not dangerous in itself.
The question, of course, is how to obtain such a thing.
The trick rely on the concept of program predictor. A program predictor is a function which predict, more or less accurately, the output of the program (note that when we refer to a program we refer to a program without side effect that just calculate an output.) it take as it's input but within reasonable time. If you have a very accurate program predictor then you can obviously use it to gain a good approximation of Solomonoff induction which run in reasonable time.
But of course, this just displace the problem: how do you get such an accurate program predictor?
Well, suppose you have a program predictor which is good enough to be improved on. Then, you use it to predict the program of less than N bits of length (with N sufficiently big of course) which maximize a utility function which measure how accurate the output of that program is as a program predictor given that it generate this output in less than T steps (where T is a reasonable number given the hardware you have access to). Then you run that program. Check the accuracy of the obtained program predictor. If insufficient repeat the process. You should eventually obtain a very accurate program predictor. QED.
So we've reduced our problem to the problem of creating a program predictor good enough to be improved upon. That should be possible. In particular, it is related to the problem of logical uncertainty. If we can get a passable understanding of logical uncertainty it should be possible to build such a program predictor using it. Thus a minimal understanding of logical uncertainty should be sufficient to obtain agi. In fact even without such understanding, it may be possible to patch together such a program predictor...
If you play taboo with the word "goals" I think the argument may be dissolved.
My laptop doesn't have a "goal" of satisfying my desire to read LessWrong. I simply open the web browser and type in the URL, initiating a basically deterministic process which the computer merely executes. No need to imbue it with goals at all.
Except now my browser is smart enough to auto-fill the LessWrong URL after just a couple of letters. Is that goal-directed behavior? I think we're already at the point of hairsplitting semantic distinctions and we're talking about web browsers, not advanced AI.
Likewise, it isn't material whether an advanced predictor/optimizer has goals, what is relevant is that it will follow its programming when that programming tells it to "tell me the answer." If it needs more information to tell you the answer, it will get it, and it won't worry about how it gets it.
I think your taboo wasn't strong enough and you allowed some leftover essence of anthropomorphic "goaliness" to pollute your argument.
When you talk about an "advanced optimizer" that "needs more information" to do something and goes out there to "get it", that presupposes a model of AIs that I consider wrong (or maybe too early to talk about). If the AI's code consists of navigating chess position trees, it won't smash you in the face with a rook in order to win, no matter how strongly it "wants" to win or ... (read more)