New FAI paper: 'Learning What to Value' by Daniel Dewey

lukeprog

Abstract: I.J. Good's theory of an "intelligence explosion" predicts that ultraintelligent agents will undergo a process of repeated self-improvement. In the wake of such an event, how well our values are fulfilled will depend on whether these ultraintelligent agents continue to act desirably and as intended. We examine several design approaches, based on AIXI, that could be used to create ultraintelligent agents. In each case, we analyze the design conditions required for a successful, well-behaved ultraintelligent agent to be created. Our main contribution is an examination of value-learners, agents that learn a utility function from experience. We conclude that the design conditions on value-learners are in some ways less demanding than those on other design approaches.

Daniel Dewey, 'Learning What to Value'

OK, some responses from me:

A reward maximizer acts so as to bring about universes in which the rewards it receives are maximized. For this reason, it will predict and may manipulate the future actions of its rewarder.

An O-maximizer with utility function U acts so as to bring about universes which score highly according to U. For this reason, it is quite unlikely to manipulate or alter its utility function

The more obvious problem for utility maximisers is fake utility.

Actually trying to apply the argument in Appendix B to an O-maximizer [...] is sufficient to show that this is also incorrect.

My position here is a bit different from Curt's. Curt will argue that both systems are likely to wirehead (and I don't necessarily disagree - the set-up in the paper is not sufficient to prevent wireheading, IMO). My angle is more that both types of systems can be made into universal agents - producing arbitrary finite action sequenes in response to whatever inputs you like.

The more obvious problem for utility maximisers is fake utility.

...but your characterisation of the behaviour of reward maximizers and utility maximisers seems ratther like a projection to me. IMO, actual behaviour will depend on what the systems believe their purpose is when they come to adjusting their brains. Since they both lack knowledge of the design purpose of their own goal systems, ISTM that the outcome could potentially vary. Maybe they will wirehead, maybe they won't.

18

New FAI paper: 'Learning What to Value' by Daniel Dewey

18

18

18

New FAI paper: 'Learning What to Value' by Daniel Dewey

18

18