Which drives can survive intelligence's self modification?

Dmytry

1 Which drives can survive intelligence's self modification?

6th Mar 2012

2 min read

1

If you gave a human ability to self modify, many would opt to turn off or massively decrease the sense of pain (and turn it into a minor warning they would then ignore), the first time they hurt themselves. Such change would immediately result in massive decrease in the fitness, and larger risk of death, yet I suspect very few of us would keep the pain at the original level; we see the pain itself as dis-utility in addition to the original damage. Very few of us would implement the pain at it's natural strength - the warning that can not be ignored - out of self preservation.

The fear is a more advanced emotion; one can fear the consequences of the fear removal, opting not to remove the fear. Yet there can still be desire to get rid of the fear, and it still holds that we hold sense of fear as dis-utility of it's own even if we fear something that results in dis-utility. Pleasure modification can be a strong death trap as well.

The boredom is easy to rid of; one can just suspend itself temporarily, or edit own memory.

For the AI, the view adopted in AI discussions is that AI would not want to modify itself in a way that would interfere with it achieving a goal. When a goal is defined from outside in human language as 'maximization of paperclips', for instance, it seems clear that modifications which break this goal should be avoided, as part of the goal itself. Our definition of a goal is non-specific of the implementation; the goal is not something you'd modify to achieve the goal. We model the AI as a goal-achieving machine, and a goal achieving machine is not something that would modify the goal.

But from inside of the AI... if the AI includes implementation of a paperclip counter, then rest of the AI has to act upon output of this counter; the goal of maximization of output of this counter would immediately result in modification of the paperclip counting procedure to give larger numbers (which may in itself be very dangerous if the numbers are variable-length; the AI may want to maximize it's RAM to store the count of imaginary paperclips - yet the big numbers processing can similarly be subverted to achieve same result without extra RAM).

That can only be resisted if the paperclip counting arises as inseparable part of the intelligence itself. When the intelligence has some other goal, and comes up with the paperclip maximization, then it wouldn't want to break the paperclip counter - yet that only shifts the problem to the other goal.

It seems to me that the AIs which don't go apathetic as they get smarter may be a smart fraction of the seed AI design space.

I thus propose, as a third alternative to UFAI and FAI, the AAI: apathetic AI. It may be the case that our best bet for designing the safe AI is to design AI that we would expect to de-goal itself and make itself live in eternal bliss, if the AI gets smart enough; it may be possible to set 'smart enough' to be smarter than humans.

Personal Blog