Evidence for the orthogonality thesis

Stuart_Armstrong

One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.

Nick Bostrom has defined the "Orthogonality thesis" as the principle that motivation and intelligence are essentially unrelated: superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation). We're trying to get some rigorous papers out so that when that question comes up, we can point people to standard, and published, arguments. Nick has had a paper accepted that points out the orthogonality thesis is compatible with a lot of philosophical positions that would seem to contradict it.

I'm hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I'm asking you for your strongest arguments for (or against) the orthogonality thesis. Think of trying to convince a conservative philosopher who's caught a bad case of moral realism - what would you say to them?

Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.

One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.

Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.

Go define a paperclip maximizer, or anything at all real maximizer, for a machine that has infinite computing power (and with which one can rather easily define a superhuman, fairly general AI). Your machine has senses but doesn't have real-world paperclip counter readily given to it.

You make one step in the right direction, that the intelligence does not necessarily share our motivation, and then make a dozen steps backwards when you anthropomorphize that it will actually care about something real just like we do - that the intelligence will necessarily be motivable, for lack of better word, just like humans are.

If you vaguely ask AI to make vague paperclips, the AI got to understand human language, understand your intent, etc. to actually make paperclips rather than say put one paperclip in a mirror box and proclaim "infinitely many paperclips created" (or edit itself and replace some of the if statements so that it is as if there were infinitely many paperclips, or any other perfectly legitimate solution). Then you need a very narrow range of bad understandings, for the AI to understand that the statement means converting universe into paperclips, but not understand that it is also implied that you only need as many paperclips as you want, that you don't want quark sized paperclips, et cetera.

"Motivability" seems to be a red herring. When we get the first AI capable of strongly affecting the real world, what makes you privilege the hypothesis that the AI's actions and mistakes will be harmless to us?

0[anonymous]14y

That's a good point, but once we develop AIs that can cross the gap of understanding, how do you guarantee that no one asks their AI to convert the universe into paperclips, intentionally or not?