Part 1 was previously posted and it seemed that people likd it, so I figured that I should post part 2 - http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-2.html
Part 1 was previously posted and it seemed that people likd it, so I figured that I should post part 2 - http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-2.html
There's a story about a card writing AI named Tully that really clarified the problem of FAI for me (I'd elaborate but I don't want to ruin it).
I've seen people talk about wireheading in this thread, but I've never seen anyone say that problems about maximizers-in-general are all implicitly problems about reward maximizers that assume that the wireheading problem has been solved. If someone has, please provide a link.
Instead of imagining intelligent agents (including humans) as 'things that are motivated to do stuff,' imagine them as programs that are designed to cause one of many possible states of the world according to a set of criteria. Google isn't 'motivated to find your search results.' Google is a program that is designed to return results that meet your search criteria.
A paperclip maximizer for example is a program that is designed to cause the one among all possible states of the world that contains the greatest integral of future paperclips.
Reward signals are values that are correlated with states of the world, but because intelligent agents exist in the world, the configuration of matter that represents the value of a reward maximizer's reward signal is part of the state of the world. So, reward maximizers can fulfill their terminal goal of maximizing the integral of their future reward signal in two ways: 1) They can maximize their reward signal by proxy by causing states of the world that maximize values that correlate with their reward signal, or; 2) they can directly change the configuration of matter that represents their reward signal. #2 is what we call wireheading.
What you're actually proposing is that a sufficiently intelligent paperclip maximizer would create a reward signal for itself and change its terminal goal from 'Cause the one of all possible states of the world that contains the greatest integral of future paperclips' to 'Cause the one of all possible states of the world that contains the greatest integral of your future reward signal.' The paperclip maximizer would not cause a state of the world in which it has a reward signal and its terminal goal is to maximize said reward signal because that would not be the one of all possible states of the world that contained the greatest integral of future paperclips.
You say that you would change your terminal goal to maximizing your reward signal because you already have a reward signal and a terminal goal to maximize it, as well as a competing terminal goal of minimizing energy expenditure (of picking the 'easiest' goals), as biological organisms are wont to have. Besides, an AI isn't going to expend any less energy turning the entire universe into hedonium than it would turning it into paperclips, right?
ETA: My conclusion about this was right, but my reasoning was wrong. As was discovered at the end of this comment thread, 'AGIs with well-defined orders of operations do not fail in the way that pinyaka describes,' (I haven't read the paper because I'm not quite on that level yet) but such a failure was a possibility contrary to my objection. Basically, pinyaka is not talking about the AI creating a reward signal for itself and maximizing it for no reason, ze is talking about the AI optimally reconfiguring the configuration of matter that represents its model of the world because this is ultimately how it will determine the utility of its actions. So, from what I understand, the AI in pinyaka's scenario is not so much spontaneously self-modifying into a reward maximizer as it is purposefully deluding itself.
My apologies for taking so long to reply. I am particularly interested in this because if you (or someone) can provide me with an example of a value system that doesn't ultimately value the output of the value function, it would change my understanding of how value systems work. So far, the two arguments against my concept of a value/behavior system seem to rely on the existence of other things that are valuable in and of themselves or that there is just another kind of value system that might exist. The other terminal value thing doesn't hold much promise... (read more)