Anthropics are a problem for humans because our utility functions are defined in terms of a "self" which is defined in a way that does not generalize well; for an AI, this would be a problem for us writing the utility function to give it, but not for the AI doing the optimization.
So if the AI were building a robot that had to bet in a presumptuous philosopher or doomsday scenario, how would it bet in each? You do already have the right answer to sleeping beauty.
There are two utility functions here, and the AI isn't optimizing the paperclip one.
I used a ridiculously bad example. I was trying to ask what would happen if the AI considered the possibility that tiling the universe with paperclips would actually satisfy its own utility function. That is very implausible, but that just means that if a superintelligence were to decide to do so, it would have a very good reason.
This premise is unnecessary, since that possibility can be folded into the probability model.
No, you would defect on the true prisoners dilemma.
So if the AI were building a robot [...] how would it bet in each?
That would depend on what the AI hoped to achieve by building the robot. It seems to me that specifying that clearly should determine what approach the AI would want the robot to take in such situations.
(More generally, it seems to me that a lot of anthropic puzzles go away if one eschews indexicals. Whether that's any use depends on how well one can do without indexicals. I can't help suspecting that the Right Answer may be "perfectly well, and things that can only be said with inde...
I am posting this is because I'm interested in self-modifying agent decision theory but I'm too lazy to read up on existing posts. I want to see a concise justification as to why a sophisticated decision theory would be needed for the implementation of an AGI. So I'll present a 'naive' decision theory, and I want to know why it is unsatisfactory.
The one condition in the naive decision theory is that the decision-maker is the only agent in the universe who is capable of self-modification. This will probably suffice for production of the first Artificial General Intelligence (since humans aren't actually all that good at self-modification.)
Suppose that our AGI has a probability model for predicting the 'state of the universe in time T (e.g. T= 10 billion years)' conditional on what it knows, and conditional on one decision it has to make. This one decision is how should it rewrite its code at time zero. We suppose it can rewrite its code instantly, and the code is limited to X bytes. So the AGI has to maximize utility at time T over all programs with X bytes. Supposing it can simulate its utility at the 'end state of the universe' conditional on which program it chooses, why can't it just choose the program with the highest utility? Implicit in our set-up is that the program it chooses may (and very likely) will have the capacity to self-modify again, but we're assuming that our AGI's probability model accounts for when and how it is likely to self-modify. Difficulties with infinite recursion loops should be avoidable if our AGI backtracks from the end of time.
Of course our AGI will need a probability model for predicting what a program for its behavior will do without having to simulate or even completely specify the program. To me, that seems like the hard part. If this is possible, I don't see why it's necessary to develop a specific theory for dealing with convoluted Newcomb-like problems, since the above seems to take care of those issues automatically.