One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.
Nick Bostrom has defined the "Orthogonality thesis" as the principle that motivation and intelligence are essentially unrelated: superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation). We're trying to get some rigorous papers out so that when that question comes up, we can point people to standard, and published, arguments. Nick has had a paper accepted that points out the orthogonality thesis is compatible with a lot of philosophical positions that would seem to contradict it.
I'm hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I'm asking you for your strongest arguments for (or against) the orthogonality thesis. Think of trying to convince a conservative philosopher who's caught a bad case of moral realism - what would you say to them?
Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.
A lot of the arguments given in these comments amount to: We can imagine a narrow AI that somehow becomes a general intelligence without wireheading or goal distortion, or, We can imagine a specific AGI architecture that is amenable to having precisely defined goals, and because we can imagine them, they're probably possible, and if they're probably possible, then they're probable. But such an argument is very weak. Our intuitions might be wrong, those AIs might not be the first to be developed, they might be theoretically possible but not pragmatically possible, and so on. Remember, we still don't know what intelligence is! We can define it as cross-domain optimization or what have you, but such definitions are not automatically valid just because they look sorta math-y. AIXI is probably not intelligent in the sense that a human is intelligent, and thus won't be dangerous. Why should I believe that any other AI architectures you come up with on the fly are any more dangerous?
So whenever you say, "imagine an AIXI approximation with a specific non-friendly utility function: that would be bad!", my response is, "who says such an AGI is even possible, let alone probable?". And whenever you say, "Omohundro says...", my response is, "Omohundro's arguments are informal and suggestive, but simply nowhere near conclusive, and in fact parts of his arguments can be taken to suggest in favor of an AI detecting and following moral law". There just aren't any knock-down arguments, because we don't know what it takes to make AGI. The best you can do is to make pragmatic arguments that caution is a good idea because the stakes are high. When people in this community act as if they have knock-down arguments where there aren't any it makes SingInst and LessWrong look like weirdly overconfident end-of-the-world-mongers.
(Also, the 'AGI will literally kill us all by default' argument is laughably bad, for many game theoretic and economic reasons both standard and acausal that should be obvious, and people unthinkingly repeating it also makes SingInst and LessWrong look like weirdly overconfident end-of-the-world-mongers.)
you're either greatly overestimating your audience (present company included) or talking to a reference class of size 10.