One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.
Nick Bostrom has defined the "Orthogonality thesis" as the principle that motivation and intelligence are essentially unrelated: superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation). We're trying to get some rigorous papers out so that when that question comes up, we can point people to standard, and published, arguments. Nick has had a paper accepted that points out the orthogonality thesis is compatible with a lot of philosophical positions that would seem to contradict it.
I'm hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I'm asking you for your strongest arguments for (or against) the orthogonality thesis. Think of trying to convince a conservative philosopher who's caught a bad case of moral realism - what would you say to them?
Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.
Maybe you misunderstood the post. The paperclipper in the post first spends some time thinking without outputting any actions, then it outputs one single action and halts, after which any changes to the map are irrelevant.
We don't have many models of AIs that output multiple successive actions, but one possible model is to have a one-action AI whose action is to construct a successor AI. In this case the first AI doesn't wirehead because it's one-action, and the second AI doesn't wirehead because it was designed by the first AI to affect the world rather than wirehead.
What makes it choose the action that fills universe with paperclips over the action that makes the goal be achieved by modification to the map? edit: or do you have some really specialized narrow AI that knows nothing whatsoever of itself in the world, and simply solves the paperclip maximization in sandbox inside itself (sandbox where the goal is not existing), then simple mechanisms make this action happen in the world?
edit: to clarify. What you don't understand is that wireheading is a valid solution to the goal. The agent is not wireheading because it ... (read more)