Evidence for the orthogonality thesis

Stuart_Armstrong

One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.

Nick Bostrom has defined the "Orthogonality thesis" as the principle that motivation and intelligence are essentially unrelated: superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation). We're trying to get some rigorous papers out so that when that question comes up, we can point people to standard, and published, arguments. Nick has had a paper accepted that points out the orthogonality thesis is compatible with a lot of philosophical positions that would seem to contradict it.

I'm hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I'm asking you for your strongest arguments for (or against) the orthogonality thesis. Think of trying to convince a conservative philosopher who's caught a bad case of moral realism - what would you say to them?

Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.

One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.

Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.

If you can build an AI like that even in theory, then the "universal morality" isn't universal, just a very powerful attractor. A very powerful attractor might indeed be a thing that exist.

Evolution does very much seem to have built us this way, just very incompetently. At the very least, I know for a fact me and the majority of other buying strongly into the lesswrong memeplex in the first place has this kind of self preserving value system.

If there is such an universal morality, or strong attractor, it's almost certainly something mathematically simple and in no way related to the complex fragile values humans have evolved. To us, it'd not seem moral at all, but horrifying and either completely incomprehensible, or converting us to it through something like nihilism or existential horror or pascal's wager style exploits of decision theory, not appealing to human specific things like compassion or fun. After all, it has to work through the kind of features present in all sufficiently intelligent agents.

For an example of what a morality that is in some sense universal looks like, look to the horror called evolution.

Thus, any AI that is not constructed in this paranoid way, is catastrophically UN friendly, on a much deeper level than any solution yet discovered. For example, some might argue that an universal morality forming AI is friendly because it's what coherent extrapolated volition would chose, but this only show that if an universal morality is even possible then the idea of coherent extrapolated volition is broken as well.

"Objective morality", if there is such a thing, is nothing more or less than the mother of all basilisks.

If you can build an AI like that even in theory, then the "universal morality" isn't universal, just a very powerful attractor.

Objective moral truth is only universal to a certain category of agents. it doesn't apply to sticks and stones, and it isn't discoverable by crazy people, or people below a certain level of intelligence. If it isn't discoverable to a typical LW-style AI, with an orthogonal architecture, unupdateable goals, and purely instrumental rationality (I'm tempted to call them Artificial Obsessive Compulsives), then so much the ... (read more)

14

Evidence for the orthogonality thesis

14

14

14

Evidence for the orthogonality thesis

14

14