It seem to me possible to create a safe oracle AI.
Suppose that you have a sequence predictor which is a good approximation of Solomonoff induction but which run in reasonable time. This sequence predictor can potentially be really useful (for example, predict future siai publications from past siai publications then proceed to read the article which give a complete account of Friendliness theory...) and is not dangerous in itself.
The question, of course, is how to obtain such a thing.
The trick rely on the concept of program predictor. A program predictor is a function which predict, more or less accurately, the output of the program (note that when we refer to a program we refer to a program without side effect that just calculate an output.) it take as it's input but within reasonable time. If you have a very accurate program predictor then you can obviously use it to gain a good approximation of Solomonoff induction which run in reasonable time.
But of course, this just displace the problem: how do you get such an accurate program predictor?
Well, suppose you have a program predictor which is good enough to be improved on. Then, you use it to predict the program of less than N bits of length (with N sufficiently big of course) which maximize a utility function which measure how accurate the output of that program is as a program predictor given that it generate this output in less than T steps (where T is a reasonable number given the hardware you have access to). Then you run that program. Check the accuracy of the obtained program predictor. If insufficient repeat the process. You should eventually obtain a very accurate program predictor. QED.
So we've reduced our problem to the problem of creating a program predictor good enough to be improved upon. That should be possible. In particular, it is related to the problem of logical uncertainty. If we can get a passable understanding of logical uncertainty it should be possible to build such a program predictor using it. Thus a minimal understanding of logical uncertainty should be sufficient to obtain agi. In fact even without such understanding, it may be possible to patch together such a program predictor...
I never said the box was trying to minimize the variance of the true solution for it's own sake, just that it was trying to find an efficient accurate approximation to the true solution. That this efficiency typically increases as the variance of the true solution decreases means that the possibility of increasing efficiency by manipulating the true solution follows. Surely, no matter how goal-agnostic your oracle is, you're going to try to make it as accurate as possible for a given computational cost, right?
That's just the first failure mode that popped into my mind, and I think it's a good one for any real computing device, but let's try to come up with an example that even applies to oracles with infinite computational capability (and that explains how that manipulation occurs in either case). Here's a slightly more technical but still grossly oversimplified discussion:
Suppose you give me the sequence of real world data y1, y2, y3, y4... and I come up with a superintelligent way to predict y5, so I tell you y5 := x5. You tell me the true y5 later, I use this new data to predict y6 := x6.
But wait! No matter how good my rule xn = f(y1...y{n-1}) was, it's now giving me the wrong answers! Even if y4 was a function of {y1,y2,y3}, the very fact that you're using my prediction x5 to affect the future of the real world means that y5 is now a function of {y1, y2, y3, y4, x5}. Eventually I'm going to notice this, and now I'm going to have to come up with a new, implicit rule for xn = f(y1...y{n-1},xn).
So now we're not just trying to evaluate an f, we're trying to find fixed points for an f - where in this context "a fixed point" is math lingo for "a self-fulfilling prophecy". And depending on what predictions are called for, that's a very different problem. "What would the stock market be likely to do tomorrow in a world with no oracles?" may give you a much more stable answer than "What is the stock market likely to do tomorrow after everybody hears the announcement of what a super-intelligent AI thinks the stock market is likely to do tomorrow?" "Who would be likely to kill someone tomorrow in a world with no oracles?" will probably result in a much shorter list than "Who is likely to kill someone tomorrow, after the police receives this answer from the oracle and sends SWAT to break down their doors?" "What is the probability of WW3 within ten years have been without an oracle?" may have a significantly more pleasant answer than "What would the probability of WW3 within ten years be, given that anyone whom the oracle convinces of a high probability has motivation to react with arms races and/or pre-emptive strikes?"
Ooh, this looks right. A predictor that "notices" itself in the outside world can output predictions that make themselves true, e.g. by stopping us from preventing predicted events, or something even more weird. Thanks!
(At first I thought Solomonoff induction doesn't have this problem, because it's uncomputable and thus cannot include a model of itself. But it seems that a computable approximation to Solomonoff induction may well exhibit such "UDT-ish" behavior, because it's computable.)