New FAI paper: 'Learning What to Value' by Daniel Dewey

lukeprog

Abstract: I.J. Good's theory of an "intelligence explosion" predicts that ultraintelligent agents will undergo a process of repeated self-improvement. In the wake of such an event, how well our values are fulfilled will depend on whether these ultraintelligent agents continue to act desirably and as intended. We examine several design approaches, based on AIXI, that could be used to create ultraintelligent agents. In each case, we analyze the design conditions required for a successful, well-behaved ultraintelligent agent to be created. Our main contribution is an examination of value-learners, agents that learn a utility function from experience. We conclude that the design conditions on value-learners are in some ways less demanding than those on other design approaches.

Daniel Dewey, 'Learning What to Value'

A UDT agent does not need to be explicitly told how to represent itself, besides knowing its own code.

This is because a UDT agent does not make an attempt to explicitly control a particular instantiation of itself. A UDT agent looks at the world and tries to control the entire thing by controlling logical facts about its own behavior. If the world happens to contain patterns similar to the agent, then the agent will recognize that controlling its own output will control the behavior of those patterns. The agent could also infer that by destroying those patterns, it will lose its ability to control the world.

I think this is a nice idea, and it does deal with (at least this particular) problem with self-representation.

The bigger issue is that (as far as I know) no one has yet found a precisely specified version of ADT/UDT/TDT which satisfies our intuitions about what TDT should do.

That sounds pretty wild. Do you think it would help any with the wirehead problem?

2cousin_it15y

Right. To be more precise about the current state of the art: we don't know any algorithm that can maximize its utility in a UDT-ish setting, but we do know algorithms that can hit a specified utility value, or ensure that utility is no less than some predefined value, if the underlying decision problem allows that at all. (Paul, I think you have already figured this out independently, right?)

18

New FAI paper: 'Learning What to Value' by Daniel Dewey

18

18

18

New FAI paper: 'Learning What to Value' by Daniel Dewey

18

18