Evidence for the orthogonality thesis

Stuart_Armstrong

One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.

Nick Bostrom has defined the "Orthogonality thesis" as the principle that motivation and intelligence are essentially unrelated: superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation). We're trying to get some rigorous papers out so that when that question comes up, we can point people to standard, and published, arguments. Nick has had a paper accepted that points out the orthogonality thesis is compatible with a lot of philosophical positions that would seem to contradict it.

I'm hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I'm asking you for your strongest arguments for (or against) the orthogonality thesis. Think of trying to convince a conservative philosopher who's caught a bad case of moral realism - what would you say to them?

Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.

One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.

Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.

A related point. I don't think the creators of The Sims, for example, anticipated that perhaps the primary use of their game would be as a sadistic torture simulator. The game explicitly rewards certain actions within the game, namely improving your game-character's status. The game tells you what you should want. But due to our twisted psychology, we get more satisfaction locking our character in a room with no toilet, bed, food or water, with a blaring TV playing all day and night. And then killing him in a fire.

Totally normal people who are not otherwise sadists will play The Sims in this fashion. Playing "Kill Your Character Horribly" is just more fun than playing the game they intended you to play. You get more utility from sadism. An AI with unanticipated internal drives will act in ways that "don't make sense." It will want things we didn't tell it to want.

Yes, this is a good point. I tried to minimize this effect (direct utility of fun playing the game in certain ways) by providing external incentives, which are assumed to be large enough to override the fun for people.

However, after more thinking, I'm not sure any external incentives would work in the important cases. After all, this belief structure - knowledge by the player of being in a game, and knowledge of getting outside utility from playing for specified arbitrary goals - appears to be able to override goals of any agent, including FAI. But if FAI ... (read more)

0TheOtherDave14y

Be careful about confusing utility and pleasure. But your point about unexpected drives leading to unexpected results is absolutely true.