A putative new idea for AI control; index here.
In a previous post, I talked about using a WBE to define a safe output for a reduced impact AI.
I've realised that the WBE isn't needed. Its only role was to ensure that the AI's output could have been credibly produced by something other than the AI - "I'm sorry, Dave. I'm afraid I can't do that." is unlikely to be the output of a random letter generator.
But a whole WBE is not needed. If the output is short, a chatbot with access to a huge corpus of human responses could function well. We can specialise it in the direction we need - if we are asking for financial advice, we can mandate a specialised vocabulary or train it on financial news sources.
So instead of training the reduced impact AI to behave as the 'best human advisor', we are are training it to behave as the 'luckiest chatbot'. This allows to calculate odds with greater precision, and has the advantage of no needing to wait for a WBE.
For some questions, we can do even better. Suppose we have a thousand different stocks, and are asking which one would increase in value the most during the coming year. The 'chatbot' here is simply an algorithm that picks a stock at random. So we now have an exact base rate - 1/1000 - and predetermined answers from the AI.
[EDIT:] Another alternative is to get online users to submit answers to the question. Then the AI selects the best answer from the choices. And if the AI is not turned on, a random answer is selected.
The AIs motivations can be precisely controlled. In fact such an AI can be limited to purely prediction. It would have no agency. No motivations or goals whatsoever. It just tries to predict what the price of each stock will be the next day.
For such a task, the AI doesn't need any of the reduced impact stuff described here. That stuff becomes relevant in more complicated domains, like controlling a robot body in the real world. Say, to do something simple like collect paperclips that have fallen on the floor.
In such a domain you might want to limit it to just predicting what a human would do if they were controlling the robot. Not find the absolute optimal sequence of actions. Which might involve running away, building more robots, and taking over the world. Then building as many paperclip factories as possible.
AIXI is controllable in this way. Or at least the Solomonoff induction part, which just predicts the future. You could just use it to see what the future will be. The dangerous optimization only comes in later. When you put another program on top of it that searches for the optimal sequence of actions to get a certain outcome. An outcome we might not want.
As far as I can tell, all the proposals for AI control require the ability to use the AI like this. As an optimizer or predictor for an arbitrary goal. Which we can control, if only in a restricted sense. If the AI is fundamentally malicious and uncontrollable, there is no way to get useful work out of it. Let alone use it to build FAI.