One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.
Nick Bostrom has defined the "Orthogonality thesis" as the principle that motivation and intelligence are essentially unrelated: superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation). We're trying to get some rigorous papers out so that when that question comes up, we can point people to standard, and published, arguments. Nick has had a paper accepted that points out the orthogonality thesis is compatible with a lot of philosophical positions that would seem to contradict it.
I'm hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I'm asking you for your strongest arguments for (or against) the orthogonality thesis. Think of trying to convince a conservative philosopher who's caught a bad case of moral realism - what would you say to them?
Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.
I think you won't find a very good argument either way, because different ways of building AIs create different constraints on the possible motivations they could have, and we don't know which methods are likely to succeed (or come first) at this point.
For example, uploads would be constrained to have motivations similar to existing humans (plus random drifts or corruptions of such). It seems impossible to create an upload who is motivated solely to fill the universe with paperclips. AIs created by genetic algorithms might be constrained to have certain motivations, which would probably differ from the set of possible motivations of AIs created by simulated biological evolution, etc.
The Orthogonality Thesis (or it's denial) must assume that certain types of AI, e.g., those based on generic optimization algorithms that can accept a wide range of objective functions, are feasible (or not) to build, but I don't think we can safely make such assumptions yet.
ETA: Just noticed Will Newsome's comment, which makes similar points.
Wei Dai's comment is full of wisdom. In particular:
But even if that is true, it is nowhere near enough to support an OT that can be plugged into an unfriendliness argument. The Unfriendliness argument requires that it is reasonably likely that researchers could create a paperclippe... (read more)