New FAI paper: 'Learning What to Value' by Daniel Dewey

lukeprog

Abstract: I.J. Good's theory of an "intelligence explosion" predicts that ultraintelligent agents will undergo a process of repeated self-improvement. In the wake of such an event, how well our values are fulfilled will depend on whether these ultraintelligent agents continue to act desirably and as intended. We examine several design approaches, based on AIXI, that could be used to create ultraintelligent agents. In each case, we analyze the design conditions required for a successful, well-behaved ultraintelligent agent to be created. Our main contribution is an examination of value-learners, agents that learn a utility function from experience. We conclude that the design conditions on value-learners are in some ways less demanding than those on other design approaches.

Daniel Dewey, 'Learning What to Value'

Disappointingly, this paper is still pre-UDT thinking. An AIXI-like agent doesn't understand that it lives within the universe that it's trying to affect, so it can unwittingly destroy its own hardware with its mining claws (to borrow a phrase from Tim Tyler).

An AIXI-like agent doesn't understand that it lives within the universe that it's trying to affect, so it can unwittingly destroy its own hardware with its mining claws (to borrow a phrase from Tim Tyler).

It looks like the newer version of the paper tries to deal with this explicitly, by introducing the concept of "agent implementation". But I can't verify whether the solution actually works, since the probability function P is left undefined.

I think the paper suffers from what may be a common failure mode in AI design: problems with the overa... (read more)

4Mitchell_Porter15y

This statement bugs me because I don't see that any solution to that problem has been developed within LW's avantgarde decision theories. In fact, they often introduce self-referential statements, so this problem in a way should be more pressing for them. The "AIXI-like agent" just doesn't have a self-representation; but a LWDT (LessWrong decision theory) agent, insofar as its decision theory revolves around self-referential propositions, does need a capacity for self-representation, and yet I don't remember this problem being discussed very much. It's more as if the problem has been overlooked because so much of the theory is discussed in natural language, and so the capacity for self-referential semantics that natural-language statements provide has been taken for granted. There is actually a decades-old body of work in computer science on self-representation, for example under the name of "computational reflection"; it was the subject of the thesis of Pattie Maes at MIT. But there's no magic bootstrap here, whereby ordinary reference turns into self-reference. It's just a matter of coding up, by hand, structures that happen to represent aspects of themselves, and then giving them causal connections so that those representations co-vary appropriately with the things they represent. This would appear to be an adequate solution to the problem, and I don't see that any alternative solutions have been invented locally.

-2XiXiDu15y

I really have to read EY's UDT paper. Can you tell me what math prerequisites I need to understand UDT?

18

New FAI paper: 'Learning What to Value' by Daniel Dewey

18

18

18

New FAI paper: 'Learning What to Value' by Daniel Dewey

18

18