(I'm not sure what the rule is here for replying to oneself. Apologies if this is considered rude; I'm trying to avoid putting TLDR text in one comment.)
Here is a set of utility-rules that I think would cause an AI to behave properly. (Would I call this "Identical Copy Decision Theory"?)
Suppose that an entity E clones itself, becoming E1 and E2. (We're being agnostic here about which of E1 and E2 is the "original". If the clone operation is perfect, the distinction is meaningless.) Before performing the clone, E calculates its expected utility U(E) = (U(E1)+U(E2))/2.
After the cloning operation, E1 and E2 have separate utility functions: E1 does not care about U(E2). "That guy thinks like me, but he isn't me."
Suppose that E1 and E2 have some experiences, and then they are merged back into one entity E' (as described in http://lesswrong.com/lw/19d/the_anthropic_trilemma/ and elsewhere). Assuming this merge operation is possible (because the experiences of E1 and E2 were not too bizarrely disjoint), the utility of E' is the average: U(E') = (U(E1) + U(E2))/2.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Humans evaluate decisions using their current utility function, not their future utility function as a potential consequence of that decision. Using my current utility function, wireheading means I will never accomplish anything again ever, and thus I view it as having very negative utility.