Evidence for the orthogonality thesis

Stuart_Armstrong

One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.

Nick Bostrom has defined the "Orthogonality thesis" as the principle that motivation and intelligence are essentially unrelated: superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation). We're trying to get some rigorous papers out so that when that question comes up, we can point people to standard, and published, arguments. Nick has had a paper accepted that points out the orthogonality thesis is compatible with a lot of philosophical positions that would seem to contradict it.

I'm hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I'm asking you for your strongest arguments for (or against) the orthogonality thesis. Think of trying to convince a conservative philosopher who's caught a bad case of moral realism - what would you say to them?

Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.

One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.

Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.

I like that idea. So, if we assume that all sufficiently smart AIs are "good", then we can put such an AI in a simulated world in which the best way to acquire resources for its good deeds would be to play a game running on a computer provided by Dark Lords of the Matrix (that's us!) and the goal of the game would be to pretend to be a "bad" AI. Except the game would really be an input/output channel into the real world. The whole system would effectively constitute a bad AI, thus contradicting the initial assumption.

However, anyone who seriously claims that sufficiently smart AIs will automatically be nice will also probably reject that argument by claiming that, well, a sufficiently smart AI would figure out that it is being tricked like that and would refuse to cooperate.

(Also: you could call it the "Ender's Game" argument if you're aiming for memorability more than respectability.)

14

Evidence for the orthogonality thesis

14

14

14

Evidence for the orthogonality thesis

14

14