Evidence for the orthogonality thesis

Stuart_Armstrong

One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.

Nick Bostrom has defined the "Orthogonality thesis" as the principle that motivation and intelligence are essentially unrelated: superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation). We're trying to get some rigorous papers out so that when that question comes up, we can point people to standard, and published, arguments. Nick has had a paper accepted that points out the orthogonality thesis is compatible with a lot of philosophical positions that would seem to contradict it.

I'm hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I'm asking you for your strongest arguments for (or against) the orthogonality thesis. Think of trying to convince a conservative philosopher who's caught a bad case of moral realism - what would you say to them?

Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.

One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.

Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.

When I looked at the puppy, I realized this:

At the moment when you create the AIs, their motivation and intelligence could be independent. But if let them run for a while, some motivations will lead to changes in intelligence. Improving intelligence could be difficult, but I think it is obvious that motivation to self-destruct will on average decrease the intelligence.

So are you talking about orthogonality of motivation and intelligence in freshly created AIs, or in running AIs?

What I'd be really looking for is: "intelligence puts some constraints on motivation, but it can still vary in all sorts of directions, far beyond what we humans usually imagine".

0Manfred14y

There's also a big effect of motivation on intelligence even outside the small part of the possibilities that think the world exactly as it is, but without them in it, is optimal. This is because some goals don't require much intelligence (by the standards of self-improving AIs, that is - we'd think it was a lot) to implement, while other goals do. EDIT: of course, what we're examining in the op is causal relations the other way, intelligence-> goals.

3Normal_Anomaly14y

I think he's looking for refutations of the statement "Improving intelligence will necessarily always change motivation to the same set of goals, regardless of the starting goal set."