It seem to me possible to create a safe oracle AI.
Suppose that you have a sequence predictor which is a good approximation of Solomonoff induction but which run in reasonable time. This sequence predictor can potentially be really useful (for example, predict future siai publications from past siai publications then proceed to read the article which give a complete account of Friendliness theory...) and is not dangerous in itself.
The question, of course, is how to obtain such a thing.
The trick rely on the concept of program predictor. A program predictor is a function which predict, more or less accurately, the output of the program (note that when we refer to a program we refer to a program without side effect that just calculate an output.) it take as it's input but within reasonable time. If you have a very accurate program predictor then you can obviously use it to gain a good approximation of Solomonoff induction which run in reasonable time.
But of course, this just displace the problem: how do you get such an accurate program predictor?
Well, suppose you have a program predictor which is good enough to be improved on. Then, you use it to predict the program of less than N bits of length (with N sufficiently big of course) which maximize a utility function which measure how accurate the output of that program is as a program predictor given that it generate this output in less than T steps (where T is a reasonable number given the hardware you have access to). Then you run that program. Check the accuracy of the obtained program predictor. If insufficient repeat the process. You should eventually obtain a very accurate program predictor. QED.
So we've reduced our problem to the problem of creating a program predictor good enough to be improved upon. That should be possible. In particular, it is related to the problem of logical uncertainty. If we can get a passable understanding of logical uncertainty it should be possible to build such a program predictor using it. Thus a minimal understanding of logical uncertainty should be sufficient to obtain agi. In fact even without such understanding, it may be possible to patch together such a program predictor...
A box that does nothing except predict the next bit in a sequence seems pretty innocuous, in the unlikely event that its creators managed to get its programming so awesomely correct on the first try that they didn't bother to give it any self-improvement goals at all.
But even in that case there are probably still gotchas. Once you start providing the box with sequences that correspond to data about the real-world results of the previous and current predictions, then even a seemingly const optimization problem statement like "find the most accurate approximation of the probability distribution function for the next data set" becomes a form of a real-world goal. Stochastic approximation accuracy is typically positively correlated with the variance of the true solution, for instance, and it's clear that the output variance of the world's future would be greatly reduced if only there weren't all those random humans mucking it up...
That doesn't sound right. The box isn't trying to minimize the "variance of the true solution". It is stating its current beliefs that were computed from the input bit sequence by using a formula. If you think it will manipulate the operator when some of its output bits are fed into itself, could you explain that a little more technically?