A recent comment about Descartes inspired this thought: the simplest possible utility function for an agent is one that only values survival of mind, as in "I think therefore I am". This function also seems to be immune to the wireheading problem because it's optimizing something directly perceivable by the agent, rather than some proxy indicator.
But when I started thinking about an AI with this utility function, I became very confused. How exactly do you express this concept of "me" in the code of a utility-maximizing agent? The problem sounds easy enough: it doesn't refer to any mystical human qualities like "consciousness", it's purely a question about programming tricks, but still it looks quite impossible to solve. Any thoughts?
You want the program to keep running in the context of the world. To specify what that means, you need to build on top of an ontology that refers to the world. But figuring out such ontology is a very difficult problem and you can't even in principle refer to the whole world as it really is: you'll always have uncertainty left, even in a general ontological model.
The program will have to know what tradeoffs to make, for example whether it's important to survive in most possible worlds with fair probability, or in at least one possible world with high prob...
This thread is for the discussion of Less Wrong topics that have not appeared in recent posts. If a discussion gets unwieldy, celebrate by turning it into a top-level post.
This thread brought to you by quantum immortality.