New FAI paper: 'Learning What to Value' by Daniel Dewey

lukeprog

Abstract: I.J. Good's theory of an "intelligence explosion" predicts that ultraintelligent agents will undergo a process of repeated self-improvement. In the wake of such an event, how well our values are fulfilled will depend on whether these ultraintelligent agents continue to act desirably and as intended. We examine several design approaches, based on AIXI, that could be used to create ultraintelligent agents. In each case, we analyze the design conditions required for a successful, well-behaved ultraintelligent agent to be created. Our main contribution is an examination of value-learners, agents that learn a utility function from experience. We conclude that the design conditions on value-learners are in some ways less demanding than those on other design approaches.

Daniel Dewey, 'Learning What to Value'

Response to Curt Welch:

Sadly, what he seems to have failed to realize, is that any actual implementation of an O-Maximizer or his Value-learners must also be reward maximizerr. Is he really that stupid so as not to understand they are all reward maximizer?

Zing! I guess he didn't think I was going to be reading that. To be fair, it may seem to him that I've made a stupid error, thinking that O-maximizers behave differently than reward maximizers. I'll try to explain why he's mistaken.

A reward maximizer acts so as to bring about universes in which the rewards it receives are maximized. For this reason, it will predict and may manipulate the future actions of its rewarder.

An O-maximizer with utility function U acts so as to bring about universes which score highly according to U. For this reason, it is quite unlikely to manipulate or alter its utility function, unless its utility function directly values universes in which it self-alters.

In particular, note that an O-maximizer does not act so as to bring about universes in which the utility it assigns to the universe is maximized. Where the reward maximizer predicts and "cares about" what the rewarder will say tomorrow, an O-maximizer uses its current utility function to evaluate futures and choose actions.

O-maximizers and reward maximizers have different relationships with their "motivators" (utility function vs. rewarder), and they behave differently when given the option to alter their motivators. It seems clear to me that they are distinct.

The only difference is in the algorithm it uses to calculate the "expected value". Dose he not understand that if you build a machine to do this, that there must be hardware in the machine that calculates that expected value? And that such a machine can then be seen as two machines, one which is calculating the expected value, and the other which is picking actions to maximize the output of that calculation? And once you have that machine, his argument of appendix B once again applies?

Actually trying to apply the argument in Appendix B to an O-maximizer, implemented or in the abstract, using the definitions given in the paper instead of reasoning by analogy, is sufficient to show that this is also incorrect.

An agent of unbounded intelligent will always reach a point of understanding he has the option to try and modify the reward function which means the wirehead problem is always on the table.

It may have the option, but will it be motivated to alter its "reward function"? Consider an O-maximizer with utility function U. It acts to maximize the universe's utility as measured by U. How would the agent's alteration of its own utility function bring about universes that score highly according to U?

OK, some responses from me:

A reward maximizer acts so as to bring about universes in which the rewards it receives are maximized. For this reason, it will predict and may manipulate the future actions of its rewarder.
An O-maximizer with utility function U acts so as to bring about universes which score highly according to U. For this reason, it is quite unlikely to manipulate or alter its utility function

The more obvious problem for utility maximisers is fake utility.

Actually trying to apply the argument in Appendix B to an O-maximizer [...] is suffi

... (read more)

18

New FAI paper: 'Learning What to Value' by Daniel Dewey

18

18

18

New FAI paper: 'Learning What to Value' by Daniel Dewey

18

18