That is try and avoid the drives by setting goals such as avoiding changing the world and turning itself off after having achieved a small goal?
"Avoid changing the world" is very hard to formalize. First, take a timeless view: there are no changes, only control over what actually happens. If the AI already exists, then it already exerts some effect on the future, controls it to some extent. "Not changing the world" can at this point only be a particular kind of control the AI exerts over the future. But what kind of control, exactly? And how ruthless would the AI be in pursuit of "not changing the world" as optimally as possible? It might wipe out humanity just to make sure it has enough resources to reliably not change the world in the future.
"Avoid changing the world" is very hard to formalize. First, take a timeless view: there are no changes, only control over what actually happens.
I don't think it is too hard. The AI can model counterfactuals right? Simply model how the world would progress if the computer had no power but the ball was red. Then attempt to maximise the mutual information of this model with whatever the models of the world the AI creates for the possible actions it can take. The more the model diverges the less mutual information.
This might have failure modes w...
One of the things I've been thinking about is how to safely explore the nature of intelligence. I'm unconvinced of FOOMing and would rather we didn't avoid AI entirely if we can't solve Yudkowsky style Friendliness. So some method of experimentation is needed to determine how powerful intelligence actually is.
So can we create an AI that has very limited scope? That is try and avoid the drives by setting goals such as avoiding changing the world and turning itself off after having achieved a small goal?
Let us say the goal is to change the colour of a ball from green to red. You can leave paint and paint brushes and a robot around to make it easy, but it might determine the best way (least world-changing) is to create a dye manufacturing bacteria instead. How well it did on the test would also allow you to gauge the optimising power of the system to know whether we need "first mover/winner take all" style friendliness or societal friendliness for many AI .
Creating AIs without drives seems easier than creating ones that do have goals to shape the rest of human history. What do other people think