Open Thread June 2010, Part 4

Will_Newsome

Open Thread June 2010, Part 4 — LessWrong

Comment Permalink

A recent comment about Descartes inspired this thought: the simplest possible utility function for an agent is one that only values survival of mind, as in "I think therefore I am". This function also seems to be immune to the wireheading problem because it's optimizing something directly perceivable by the agent, rather than some proxy indicator.

But when I started thinking about an AI with this utility function, I became very confused. How exactly do you express this concept of "me" in the code of a utility-maximizing agent? The problem sounds easy enough: it doesn't refer to any mystical human qualities like "consciousness", it's purely a question about programming tricks, but still it looks quite impossible to solve. Any thoughts?

Showing 3 of 4 replies (Click to show all)

Vladimir_Nesov16y40

You want the program to keep running in the context of the world. To specify what that means, you need to build on top of an ontology that refers to the world. But figuring out such ontology is a very difficult problem and you can't even in principle refer to the whole world as it really is: you'll always have uncertainty left, even in a general ontological model.

The program will have to know what tradeoffs to make, for example whether it's important to survive in most possible worlds with fair probability, or in at least one possible world with high prob... (read more)

2Richard_Kennaway16y

An agent's "me" is its model of itself. This is already a fairly complicated thing for an agent to have, and it need not have one. Why do you say that an agent can "directly perceive" its own mind? Or anything else? A perception is just a signal somewhere inside the agent: a voltage, a train of neural firings, or whatever. It can never be identical to the thing that caused it, the thing that it is a perception of. People can very easily have mistaken ideas of who they are.

0red7516y

Program must have something to preserve. My first thought is preservation of declarative memory: ensure that future contain chain of systems, implementing same goal, with overlapping declarative memory. I haven't made an analysis, just first thought.

See in context