One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.
Nick Bostrom has defined the "Orthogonality thesis" as the principle that motivation and intelligence are essentially unrelated: superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation). We're trying to get some rigorous papers out so that when that question comes up, we can point people to standard, and published, arguments. Nick has had a paper accepted that points out the orthogonality thesis is compatible with a lot of philosophical positions that would seem to contradict it.
I'm hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I'm asking you for your strongest arguments for (or against) the orthogonality thesis. Think of trying to convince a conservative philosopher who's caught a bad case of moral realism - what would you say to them?
Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.
Exactly. The first AI we can create, certainly can't have 'nearly any type of motivation'.
There are several classes of AIs we can create; the uploads start off human; the human embryonic development sim (or other brain emulation that isn't upload) is basically a child that learns and becomes human; that is to some extent true of most learning AI approaches; the neat AI that starts stupid can not start off with the goals that require highly accurate world-model (like the paperclip maximization) or the goals that lead to AI damaging itself, or the goals that prevent AI self improvement, as the first AI we create reasonably doesn't start at grown-up educated Descartes level intelligence and invents the notion of self, and figures out that it must preserve itself to achieve the goals (and then figures out that it must keep the goals above the instrumental self preservation).
On top of this, as I commented on some other thread (forgot where) with the Greenpeace By Default example, if you generate random code, the simplest-behaving code dominates the space of code that doesn't crash. This goes for the goal systems.
The orthogonality thesis, even if in some narrow sense true (or broad sense, for that matter), is entirely irrelevant; for example absolute orthogonality thesis would be entirely compatible with the hypothetical where out of the random goal space for the seed AI, and excluding the AIs that self destruct or fail to self improve, only one in 10^1000 is mankind destroying to any extent (simply because one or two simplest goal systems end up mankind-preserving because they were too simple to preserve just the AI).