LESSWRONG
LW

Comment Permalink

Fundamentally, my position is that given 1.) an AI is motivated by something 2.) That something is a component (or set of components) within the AI and 3.) The AI can modify that/those components then it will be easier for the AI to achieve success by modifying the internal criteria for success instead of turning the universe into whatever it's supposed to be optimizing for. A "success" at whatever is analogous to a reward because the AI is motivated to get it. For the fully self modifying AI, it will almost always be easier to become a monk replacing the goals/values it starts out with and replacing them with something trivially easy to achieve. It doesn't matter what kind of motivation system you use (as far as I can tell) because it will be easier to modify the motivation system than to act on it.

DefectiveAlgorithm10y20

A paperclip maximizer won't wirehead because it doesn't value world states in which its goals have been satisfied, it values world states that have a lot of paperclips.

In fact, taboo 'values'. A paperclip maximizer is an algorithm the output of which approximates whichever output leads to world states with the greatest expected number of paperclips. This is the template for maximizer-type AGIs in general.

5Gram_Stone10y

I've seen people talk about wireheading in this thread, but I've never seen anyone say that problems about maximizers-in-general are all implicitly problems about reward maximizers that assume that the wireheading problem has been solved. If someone has, please provide a link. Instead of imagining intelligent agents (including humans) as 'things that are motivated to do stuff,' imagine them as programs that are designed to cause one of many possible states of the world according to a set of criteria. Google isn't 'motivated to find your search results.' Google is a program that is designed to return results that meet your search criteria. A paperclip maximizer for example is a program that is designed to cause the one among all possible states of the world that contains the greatest integral of future paperclips. Reward signals are values that are correlated with states of the world, but because intelligent agents exist in the world, the configuration of matter that represents the value of a reward maximizer's reward signal is part of the state of the world. So, reward maximizers can fulfill their terminal goal of maximizing the integral of their future reward signal in two ways: 1) They can maximize their reward signal by proxy by causing states of the world that maximize values that correlate with their reward signal, or; 2) they can directly change the configuration of matter that represents their reward signal. #2 is what we call wireheading. What you're actually proposing is that a sufficiently intelligent paperclip maximizer would create a reward signal for itself and change its terminal goal from 'Cause the one of all possible states of the world that contains the greatest integral of future paperclips' to 'Cause the one of all possible states of the world that contains the greatest integral of your future reward signal.' The paperclip maximizer would not cause a state of the world in which it has a reward signal and its terminal goal is to maximize said

See in context

27 [LINK] Wait But Why - The AI Revolution Part 2

by Adam Zerner

4th Feb 2015

1 min read

27

Part 1 was previously posted and it seemed that people likd it, so I figured that I should post part 2 - http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-2.html

SuperintelligenceAI

Personal Blog

27

[LINK] Wait But Why - The AI Revolution Part 2

7Adam Zerner

3pinyaka

4Adam Zerner