Evidence for the orthogonality thesis

Stuart_Armstrong

One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.

Nick Bostrom has defined the "Orthogonality thesis" as the principle that motivation and intelligence are essentially unrelated: superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation). We're trying to get some rigorous papers out so that when that question comes up, we can point people to standard, and published, arguments. Nick has had a paper accepted that points out the orthogonality thesis is compatible with a lot of philosophical positions that would seem to contradict it.

I'm hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I'm asking you for your strongest arguments for (or against) the orthogonality thesis. Think of trying to convince a conservative philosopher who's caught a bad case of moral realism - what would you say to them?

Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.

One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.

Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.

If the AI's map represents the territory accurately enough, the AI can use the map to check the consequences of returning different actions, then pick one action and return it, ipso facto affecting the territory. I think I already know how to build a working paperclipper in a Game of Life universe, and it doesn't seem to wirehead itself. Do you have a strong argument why all non-magical real-world AIs will wirehead themselves before they get a chance to hurt humans?

This isn't quite an AGI. In particular, it doesn't even take input from its surroundings.

-4Dmytry14y

I don't see why it doesn't seem to wirehead itself, unless for some reason the game of life manipulators are too clumsy to send a glider to achieve the goal by altering the value within the paperclipper (e.g. within it's map). Ultimately the issue is that the goal is achieved when some cells within paperclipper which define the goal acquire certain values. You need to have rather specific action generator so that it avoids generating the action that changes the cells within paperclipper. Can you explain why this solution would not be arrived at? Can your paperclipper then self improve if it can't self modify? I do imagine that very laboriously you can manage to define some sort of paperclipping goal (maximize number of live cells?), on the AI into which you, by hand, hard coded complete understanding of game of life, and you might be able to make it not recognize sending of the glider into the goal system and changing it as 'goal accomplished'. The issue is not whenever it's possible (I can make a battery of self replicating glider guns and proclaim them to be an AI), the issue is whenever it is at all likely to happen without immense lot of work implementing much of the stuff that the AI ought to learn, into the AI, by hand. Ultimately with no role for AI's intelligence as intelligence amplifier, but only as obstacle that gets in your way. Furthermore, keep in mind that the AI's model of game of life universe is incomplete. The map does not represent territory accurately enough, and can not, as the AI occupies only a small fraction of the universe, and encodes the universe into itself very inefficiently.

-1Will_Newsome14y

Perhaps it's also worth bringing up the example of controllers, which don't wirehead (or do they, once sufficiently complex?) and do optimize the real world. (Thermostats confuse me. Do they have intentionality despite lacking explicit representations? (FWIW Searle told me the answer was no because of something about consciousness, but I'm not sure how seriously he considered my question.))