LESSWRONG
is fundraising!
LW
$

Comment Permalink

Two other people in this thread have pointed out that the value collapse into wireheading or something else is a known and unsolved problem and that the problems of an intelligence that optimizes for something assumes that the AI makes it through this in some unknown way. This suggests that I am not wrong, I'm just asking a question for which no one has an answer yet.

Instead of imagining intelligent agents (including humans) as 'things that are motivated to do stuff,' imagine them as programs that are designed to cause one of many possible states of the world according to a set of criteria. Google isn't 'motivated to find your search results.' Google is a program that is designed to return results that meet your search criteria.

A paperclip maximizer for example is a program that is designed to cause the one among all possible states of the world that contains the greatest integral of future paperclips.

Reward signals are values that are correlated with states of the world, but because intelligent agents exist in the world, the configuration of matter that represents the value of a reward maximizer's reward signal is part of the state of the world. So, reward maximizers can fulfill their terminal goal of maximizing the integral of their future reward signal in two ways: 1) They can maximize their reward signal by proxy by causing states of the world that maximize values that correlate with their reward signal, or; 2) they can directly change the configuration of matter that represents their reward signal. #2 is what we call wireheading.

What you're actually proposing is that a sufficiently intelligent paperclip maximizer would create a reward signal for itself and change its terminal goal from 'Cause the one of all possible states of the world that contains the greatest integral of future paperclips' to 'Cause the one of all possible states of the world that contains the greatest integral of your future reward signal.' The paperclip maximizer would not cause a state of the world in which it has a reward signal and its terminal goal is to maximize said reward signal because that would not be the one of all possible states of the world that contained the greatest integral of future paperclips.

You say that you would change your terminal goal to maximizing your reward signal because you already have a reward signal and a terminal goal to maximize it, as well as a competing terminal goal of minimizing energy expenditure (of picking the 'easiest' goals), as biological organisms are wont to have. Besides, an AI isn't going to expend any less energy turning the entire universe into hedonium than it would turning it into paperclips, right?

ETA: My conclusion about this was right, but my reasoning was wrong. As was discovered at the end of this comment thread, 'AGIs with well-defined orders of operations do not fail in the way that pinyaka describes,' (I haven't read the paper because I'm not quite on that level yet) but such a failure was a possibility contrary to my objection. Basically, pinyaka is not talking about the AI creating a reward signal for itself and maximizing it for no reason, ze is talking about the AI optimally reconfiguring the configuration of matter that represents its model of the world because this is ultimately how it will determine the utility of its actions. So, from what I understand, the AI in pinyaka's scenario is not so much spontaneously self-modifying into a reward maximizer as it is purposefully deluding itself.

pinyaka10y00

See in context

27 [LINK] Wait But Why - The AI Revolution Part 2

by Adam Zerner

4th Feb 2015

1 min read

27

Part 1 was previously posted and it seemed that people likd it, so I figured that I should post part 2 - http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-2.html

SuperintelligenceAI

Personal Blog

27

[LINK] Wait But Why - The AI Revolution Part 2

7Adam Zerner

3pinyaka

4Adam Zerner