AlexMennen comments on A utility-maximizing varient of AIXI - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (20)
Making the assumption that there is a small probability that you will deviate from your current plan on each future move, and that these probabilities add up to a near guarantee that you will eventually, has a more complicated effect on your planning than just justifying chasing the supremum.
For instance, consider this modification to the toy example I gave earlier. Y:={a,b,c}, and if the first b comes before the first c, then the resulting utility is 1 - 1/n, where n is the index of the first b (all previous elements being a), as before. But we'll change it so that the utility of outputting an infinite stream of a is 1. If there is a c in your action sequence and it comes before the first b, then the utility you get is -1000. In this situation, supremum-chasing works just fine if you completely trust your future self: you output a every time, and get a utility of 1, the best you could possibly do. But if you think that there is a small risk that you could end up outputting c at some point, then eventually it will be worthwhile to output b, since the gain you could get from continuing to output a gets arbitrarily small compared to the loss from accidentally outputting c.
I don't really have answers to these questions. One thing you could do is replace the set of all policies (P) with the set of all computable policies, so that the agent would never output an uncomputable action sequence [Edit: oops, not true. You could consider only computable policies, but then end up at an uncomputable policy anyway by chasing the supremum].
Yeah, I was intentionally vague with "the probabilistic nature of things". I am also thinking about how any AI will have logical uncertainty, uncertainty about the precision of its observations, et cetera, so that as it considers further points in the future, its distribution becomes flatter. And having a non-dualist framework would introduce uncertainty about the agent's self, its utility function, its memory, ...