Evidence for the orthogonality thesis

Stuart_Armstrong

One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.

Nick Bostrom has defined the "Orthogonality thesis" as the principle that motivation and intelligence are essentially unrelated: superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation). We're trying to get some rigorous papers out so that when that question comes up, we can point people to standard, and published, arguments. Nick has had a paper accepted that points out the orthogonality thesis is compatible with a lot of philosophical positions that would seem to contradict it.

I'm hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I'm asking you for your strongest arguments for (or against) the orthogonality thesis. Think of trying to convince a conservative philosopher who's caught a bad case of moral realism - what would you say to them?

Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.

One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.

Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.

I suppose you could build an AI that had both drives to self improve, and an extreme caution about accidentally changing its other values (although evolution doesn't seem to have built us that way). That gives you the welcome conclusion that the AI in question is potentially unfriendly, rather than the disturbing one that it is potentially self-correcting. But we already knew you could build unfriendly AIs if you want to: the question is whether the friendly or neutral AI you think you are building will turn on you, whether you can achive unfriendliness without carefully designing it in.

If you can build an AI like that even in theory, then the "universal morality" isn't universal, just a very powerful attractor. A very powerful attractor might indeed be a thing that exist.

Evolution does very much seem to have built us this way, just very incompetently. At the very least, I know for a fact me and the majority of other buying strongly into the lesswrong memeplex in the first place has this kind of self preserving value system.

If there is such an universal morality, or strong attractor, it's almost certainly something mathematical... (read more)

14

Evidence for the orthogonality thesis

14

14

14

Evidence for the orthogonality thesis

14

14