It seem to me possible to create a safe oracle AI.
Suppose that you have a sequence predictor which is a good approximation of Solomonoff induction but which run in reasonable time. This sequence predictor can potentially be really useful (for example, predict future siai publications from past siai publications then proceed to read the article which give a complete account of Friendliness theory...) and is not dangerous in itself.
The question, of course, is how to obtain such a thing.
The trick rely on the concept of program predictor. A program predictor is a function which predict, more or less accurately, the output of the program (note that when we refer to a program we refer to a program without side effect that just calculate an output.) it take as it's input but within reasonable time. If you have a very accurate program predictor then you can obviously use it to gain a good approximation of Solomonoff induction which run in reasonable time.
But of course, this just displace the problem: how do you get such an accurate program predictor?
Well, suppose you have a program predictor which is good enough to be improved on. Then, you use it to predict the program of less than N bits of length (with N sufficiently big of course) which maximize a utility function which measure how accurate the output of that program is as a program predictor given that it generate this output in less than T steps (where T is a reasonable number given the hardware you have access to). Then you run that program. Check the accuracy of the obtained program predictor. If insufficient repeat the process. You should eventually obtain a very accurate program predictor. QED.
So we've reduced our problem to the problem of creating a program predictor good enough to be improved upon. That should be possible. In particular, it is related to the problem of logical uncertainty. If we can get a passable understanding of logical uncertainty it should be possible to build such a program predictor using it. Thus a minimal understanding of logical uncertainty should be sufficient to obtain agi. In fact even without such understanding, it may be possible to patch together such a program predictor...
It seems to me that Eliezer's post is just wrong. His argument boils down to this:
It's not obvious that a Solomonoff-approximating AI must have a "goal". It could just be a box that, y'know, predicts the next bit in a sequence. After all, if we had an actual uncomputable black box that printed correct Solomonoff-derived probability values for the next bit according to the mathematical definition, that box wouldn't try to manipulate the human operator by embeding epileptic patterns in its predictions or something.
Maybe you could make a case that self-improvement requires real-world goals and is scary (instead of "superintelligence requires real-world goals and is scary"). But I'm not convinced of that either. In fact Karl's post shows that it's not necessarily the case. Also see Schmidhuber's work on Goedel machines, etc. Most self-improving thingies I can rigorously imagine are not scary at all.
It is indeed true that reinforcement learning AIs are scary. For example, AIXI can and will manipulate you into rewarding it. But there are many ideas besides reinforcement learning.
ETA: I gave an idea for AI containment sometime ago, and it didn't get shot down. There are probably many other ways to build a non-dangerous strong AI that don't involve encoding or inferring the unitlity function of humanity.
ETA 2: it turns out that the connotations of this comment are wrong, thanks roystgnr.
Also, what about this?