Anja comments on Universal agents and utility functions - Less Wrong

29 Post author: Anja 14 November 2012 04:05AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (38)

You are viewing a single comment's thread. Show more comments above.

Comment author: Anja 17 November 2012 10:03:11PM *  0 points [-]

First, replace the action-perception sequence with an action-perception-utility sequence u1,y1,x1,u2,y2,x2,etc.

This seems unnecessary. The information u_i is already contained in x_i.

modeled_action(n, k) = argmax(y_k) u_k(yx_<k, yx_k:n)*M(uyx_<k, uyx_k:n)

This completely breaks the expectimax principle. I assume you actually mean something like

which is just Agent 2 in disguise.

Comment author: AlexMennen 18 November 2012 02:10:03AM 0 points [-]

Oops. Yes, that's what I meant. But it is not the same as Agent 2, because this (Agent 4?) uses its current utility function to evaluate the desirability of future observations and actions, even though it knows that it will use a different utility function to choose between them later. For example, Agent 4 will not take the Simpleton's Gambit because it cares about its current utility function getting satisfied in the future, not about its future utility function getting satisfied in the future.

Agent 4 can be seen as a set of agents, one for each possible utility function, that are using game theory with each other.

Comment author: Anja 18 November 2012 03:06:23AM 0 points [-]

I second the general sentiment that it would be good for an agent to have these traits, but if I follow your equations I end up with Agent 2.

Comment author: AlexMennen 18 November 2012 08:11:54PM *  2 points [-]

No, you don't. If you tried to represent Agent 2 in that notation, you would get

modeled_action(n, k) = argmax(y_k) sum(x_k) [u_k(yx_<k, yx_k) + u_(k+1)(yx_<k, yx_k:k+1) + ... + u_n(yx_<k, yx_k:n)]*M(yx_<k, yx_k:n), where y_m = modeled_action(n, m) for m>k.

You were using u_k to represent the utility of the last step of its input, so that total utility is the sum of the utilities of its prefixes, while I was using u_k to represent the utility of the whole sequence. If I adapt Agent 4 to your use of u_k, I get

modeled_action(n, k) = argmax(y_k) sum(x_k) [u_k(yx_<k, yx_k) + u_k(yx_<k, yx_k:k+1) + ... + u_k(yx_<k, yx_k:n)]*M(yx_<k, yx_k:n), where y_m = modeled_action(n, m) for m>k.

Comment author: Anja 19 November 2012 04:26:07AM *  3 points [-]

I am starting to see what you mean. Let's stick with utility functions over histories of length m_k (whole sequences) like you proposed and denote them with a capital U to distinguish them from the prefix utilities. I think your Agent 4 runs into the following problem: modeled_action(n,m) actually depends on the actions and observations yx_{k:m-1} and needs to be calculated for each combination, so y_m is actually

which clutters up the notation so much that I don't want to write it down anymore.

We also get into trouble with taking the expectation, the observations x_{k+1:n} are only considered in modeling the actions of the future agents, but not now. What is M(yx_<k,yx_k:n) even supposed to mean, where do the x's come from?

So let's torture some indices:

where n>=k and

This is not really AIXI anymore and I am not sure what to do with it, but I like it.

Comment author: AlexMennen 19 November 2012 05:03:36AM 1 point [-]

so y_m is actually [...] which clutters up the notation so much that I don't want to write it down anymore.

Yes.

We also get into trouble with taking the expectation, the observations x{k+1:n} are only considered in modeling the actions of the future agents, but not now. What is M(yx<k,yx_k:n) even supposed to mean, where do the x's come from?

Oops, you are right. The sum should have been over x_{k:n}, not just over x_k.

So let's torture some indices: [...]

Yes, that is a cleaner and actually correct version what I was trying to describe. Thanks.