Evidence for the orthogonality thesis

Stuart_Armstrong

One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.

Nick Bostrom has defined the "Orthogonality thesis" as the principle that motivation and intelligence are essentially unrelated: superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation). We're trying to get some rigorous papers out so that when that question comes up, we can point people to standard, and published, arguments. Nick has had a paper accepted that points out the orthogonality thesis is compatible with a lot of philosophical positions that would seem to contradict it.

I'm hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I'm asking you for your strongest arguments for (or against) the orthogonality thesis. Think of trying to convince a conservative philosopher who's caught a bad case of moral realism - what would you say to them?

Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.

One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.

Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.

The paperclipper's goal is not to modify the map in a specific way, but to fill the return value register with a value that obeys specific constraints. (Or to zoom in even further, the paperclipper doesn't even have a fundamental "goal". The paperclipper just enumerates different values until it finds one that fits the constraints. When a value is found, it gets written to the register, and the program halts. That's all the program does.) After that value ends up in the register, it causes ripples in the world, because the register is physically connected to actuators or something, which were also described in the paperclipper's map. If the value indeed obeys the constraints, the ripples in the world will lead to creating many paperclips.

Not sure what sending gliders has to do with the topic. We're talking about the paperclipper wireheading itself, not the game manipulators trying to wirehead the paperclipper.

Incompleteness of the model, self-modification and other issues seem to be red herrings. If we have a simple model where wireheading doesn't happen, why should we believe that wireheading will necessarily happen in more complex models? I think a more formal argument is needed here.

You don't have simple model where wireheading doesn't happen, you have the model where you didn't see how the wireheading would happen by the paperclipper, erhm, touching itself (i.e. it's own map) with it's manipulators, satisfying the condition without filling universe with paperclips.

edit: that is to say, the agent which doesn't internally screw up it's model, can still e.g. dissolve the coat off a ram chip and attach a wire there, or failing that, produce the fake input for it's own senses (which we do a whole lot).

14

Evidence for the orthogonality thesis

14

14

14

Evidence for the orthogonality thesis

14

14