What do superintelligences really want? [Link]

XiXiDu

4 What do superintelligences really want? [Link]

24th Jan 2011

3 min read

4

In Conclusion:

In the case of humans, everything that we do that seems intelligent is part of a large, complex mechanism in which we are engaged to ensure our survival. This is so hardwired into us that we do not see it easily, and we certainly cannot change it very much. However, superintelligent computer programs are not limited in this way. They understand the way that they work, can change their own code, and are not limited by any particular reward mechanism. I argue that because of this fact, such entities are not self-consistent. In fact, if our superintelligent program has no hard-coded survival mechanism, it is more likely to switch itself off than to destroy the human race willfully.

Link: physicsandcake.wordpress.com/2011/01/22/pavlovs-ai-what-did-it-mean/

Suzanne Gildert basically argues that any AGI that can considerably self-improve would simply alter its reward function directly. I'm not sure how she arrives at the conclusion that such an AGI would likely switch itself off. Even if an abstract general intelligence would tend to alter its reward function, wouldn't it do so indefinitely rather than switching itself off?

So imagine a simple example – our case from earlier – where a computer gets an additional ’1′ added to a numerical value for each good thing it does, and it tries to maximize the total by doing more good things. But if the computer program is clever enough, why can’t it just rewrite it’s own code and replace that piece of code that says ‘add 1′ with an ‘add 2′? Now the program gets twice the reward for every good thing that it does! And why stop at 2? Why not 3, or 4? Soon, the program will spend so much time thinking about adjusting its reward number that it will ignore the good task it was doing in the first place!
It seems that being intelligent enough to start modifying your own reward mechanisms is not necessarily a good thing!

If it wants to maximize its reward by increasing a numerical value, why wouldn't it consume the universe doing so? Maybe she had something in mind along the lines of an argument by Katja Grace:

In trying to get to most goals, people don’t invest and invest until they explode with investment. Why is this? Because it quickly becomes cheaper to actually fulfil a goal at than it is to invest more and then fulfil it. [...] A creature should only invest in many levels of intelligence improvement when it is pursuing goals significantly more resource intensive than creating many levels of intelligence improvement.

Link: meteuphoric.wordpress.com/2010/02/06/cheap-goals-not-explosive/

I am not sure if that argument would apply here. I suppose the AI might hit diminishing returns but could again alter its reward function to prevent that, though what would be the incentive for doing so?

ETA:

I left a comment over there:

Because it would consume the whole universe in an effort to encode an even larger reward number? In the case that an AI decides to alter its reward function directly, maximizing its reward by means of improving its reward function becomes its new goal. Why wouldn’t it do everything to maximize its payoff, after all it has no incentive to switch itself off? And why would it account for humans in doing so?

ETA #2:

What else I wrote:

There is absolutely no reason (incentive) for it to do anything except increasing its reward number. This includes the modification of its reward function in any way that would not increase the numerical value that is the reward number.

We are talking about a general intelligence with the ability to self-improve towards superhuman intelligence. Of course it would do a long-term risks-benefits analysis and calculate its payoff and do everything to increase its reward number maximally. Human values are complex but superhuman intelligence does not imply complex values. It has no incentive to alter its goal.

Personal Blog

4

New Comment

Rendering 0/69 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 7:32 PM

Moderation Log

4 What do superintelligences really want? [Link]

by XiXiDu

24th Jan 2011

3 min read

4

In Conclusion:

In the case of humans, everything that we do that seems intelligent is part of a large, complex mechanism in which we are engaged to ensure our survival. This is so hardwired into us that we do not see it easily, and we certainly cannot change it very much. However, superintelligent computer programs are not limited in this way. They understand the way that they work, can change their own code, and are not limited by any particular reward mechanism. I argue that because of this fact, such entities are not self-consistent. In fact, if our superintelligent program has no hard-coded survival mechanism, it is more likely to switch itself off than to destroy the human race willfully.

Link: physicsandcake.wordpress.com/2011/01/22/pavlovs-ai-what-did-it-mean/

So imagine a simple example – our case from earlier – where a computer gets an additional ’1′ added to a numerical value for each good thing it does, and it tries to maximize the total by doing more good things. But if the computer program is clever enough, why can’t it just rewrite it’s own code and replace that piece of code that says ‘add 1′ with an ‘add 2′? Now the program gets twice the reward for every good thing that it does! And why stop at 2? Why not 3, or 4? Soon, the program will spend so much time thinking about adjusting its reward number that it will ignore the good task it was doing in the first place!
It seems that being intelligent enough to start modifying your own reward mechanisms is not necessarily a good thing!

If it wants to maximize its reward by increasing a numerical value, why wouldn't it consume the universe doing so? Maybe she had something in mind along the lines of an argument by Katja Grace:

In trying to get to most goals, people don’t invest and invest until they explode with investment. Why is this? Because it quickly becomes cheaper to actually fulfil a goal at than it is to invest more and then fulfil it. [...] A creature should only invest in many levels of intelligence improvement when it is pursuing goals significantly more resource intensive than creating many levels of intelligence improvement.

Link: meteuphoric.wordpress.com/2010/02/06/cheap-goals-not-explosive/

ETA:

I left a comment over there:

Because it would consume the whole universe in an effort to encode an even larger reward number? In the case that an AI decides to alter its reward function directly, maximizing its reward by means of improving its reward function becomes its new goal. Why wouldn’t it do everything to maximize its payoff, after all it has no incentive to switch itself off? And why would it account for humans in doing so?

ETA #2:

What else I wrote:

There is absolutely no reason (incentive) for it to do anything except increasing its reward number. This includes the modification of its reward function in any way that would not increase the numerical value that is the reward number.

We are talking about a general intelligence with the ability to self-improve towards superhuman intelligence. Of course it would do a long-term risks-benefits analysis and calculate its payoff and do everything to increase its reward number maximally. Human values are complex but superhuman intelligence does not imply complex values. It has no incentive to alter its goal.

Personal Blog

4

New Comment

Rendering 0/69 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 7:32 PM

Moderation Log

More from XiXiDu

Curated and popular this week

69Comments

Comment Permalink

timtyler15y10

I was thinking of a recent presentation I saw where the presenter said "It [AIXI] gets rid of all the humans, and it gets a brick, and puts it on the reward button." and it turns out that was Roko, not Eliezer.

Hutter has discussed AIXI wireheading several times, most recenly in his AGI-10 presentation - where he discusses wireheading in the Q & A at the end (01:03:00) - claiming that he can prove it won't happen in some cases - but not all of them.

Mostly he argues that it probably won't do it - for the same reason that many humans don't take drugs: the long-term rewards are low.

Here's a quote:

Another problem connected, but possibly not limited to embodied agents, especially if they are rewarded by humans, is the following: Sufficiently intelligent agents may increase their rewards by psychologically manipulating their human “teachers”, or by threatening them. This is a general sociological problem which successful AI will cause, which has nothing specifically to do with AIXI. Every intelligence superior to humans is capable of manipulating the latter. In the absence of manipulable humans, e.g. where the reward structure serves a survival function, AIXI may directly hack into its reward feedback. Since this will unlikely increase its long-term survival, AIXI will probably resist this kind of manipulation (like most humans don’t take hard drugs, due to their long-term catastrophic consequences).

timtyler15y00

Marcus Hutter once wrote:

Another problem connected, but possibly not limited to embodied agents, especially if they are rewarded by humans, is the following: Sufficiently intelligent agents may increase their rewards by psychologically manipulating their human “teachers”, or by threatening them. This is a general sociological problem which successful AI will cause, which has nothing specifically to do with AIXI.

These days, one might say: "this is a general sociological problem which pure reinforcement learning agents will cause - which illustrates why we should not build them."

0Wei Dai15y

Thanks, I wasn't aware that he had address the issue at all. When I made the argument to him in 2002, he didn't respond to my post. [...] After Googling for quote to see where it came from, I see that you refuted Hutter's counter-argument yourself at http://alife.co.uk/essays/on_aixi/. (Why didn't you link to it?) I agree with your counter-counter-argument.

See in context