Toggle comments on Natural selection defeats the orthogonality thesis - Less Wrong

-13 Post author: aberglas 29 September 2014 08:52AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (71)

You are viewing a single comment's thread. Show more comments above.

Comment author: cousin_it 29 September 2014 09:56:14AM *  4 points [-]

Suppose there were a number of paper clip making super intelligences. And then through some random event or error in programming just one of them lost that goal, and reverted to just the intrinsic goal of existing. Without the overhead of producing useless paper clips that AI would, over time, become much better at existing than the other AIs. It would eventually displace them and become the only AI, until it fragmented into multiple competing AIs. This is just the evolutionary principle of use it or lose it.

Thus giving an AI an initial goal is like trying to balance a pencil on its point. If one is skillful the pencil may indeed remain balanced for a considerable period of time. But eventually some slight change in the environment, the tiniest puff of wind, a vibration on its support, and the pencil will revert to its ground state by falling over. Once it falls over it will never rebalance itself automatically.

The original AI would spend resources on safeguarding itself against value drift, and destroy AIs with competing goals while they're young. After all, that strategy leads to more paperclips in the long run.

Comment author: Toggle 29 September 2014 01:00:42PM 1 point [-]

The singleton-with-explicit-utility-function scenario certainly seems like a strong candidate for our future, but is it necessarily a given? Suppose an AI that is not Friendly (although possibly friendly with the lowercase 'f') with an unstable utility function- it alters its values based on experience, etc.

We know that this is possible to do in AGI, because it happens all the time in humans. The orthogonality thesis states that we can match any set of values to any intelligence. If we accept that at face value, it should be at least theoretically possible for any intelligence, even a superintelligence, to trade one set of values for another- provided it keeps to the set of values that permit self-edits of the utility function. The criterion by which the superintelligence alters its utility function might be inscrutably complex from a human perspective, but I can't think of a reason why it would necessarily fall in to a permanent stable state.