Manfred comments on Universal agents and utility functions - Less Wrong

29 Post author: Anja 14 November 2012 04:05AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (38)

You are viewing a single comment's thread.

Comment author: Manfred 14 November 2012 07:25:49AM *  0 points [-]

Could you explain more about why you're down on agent 1, and think agent 2 won't wirehead?

My first impression is that agent 1 will take its expected changes into account when trying to maximize the time-summed (current) utility function, and so it won't just purchase options it will never use, or similar "dumb stuff." On the other topic, the only way agent 2 can't wirehead is if there's no possible way for it to influence its likely future utility functions - otherwise it'll act to increase the probability that it chooses big, easy utility functions, and then it will choose same big easy utility functions, and then it's wireheaded.

Comment author: Anja 15 November 2012 01:30:01AM 0 points [-]

I am pretty sure that Agent 2 will wirehead on the Simpleton Gambit, depending heavily on the number of time cycles to follow, the comparative advantage that can be gained from wireheading and the negative utility the current utility function assigns to the change.

Agent 1 will have trouble modeling how its decision to change its utility function now will influence its own decisions later, as described in AIXI and existential despair. So basically the two futures look very similar to the agent except that for the part where the screen says something different and then it all comes down to whether the utility function has preferences over that particular fact.

Comment author: Manfred 16 November 2012 01:18:26AM 1 point [-]

Agent 1 will have trouble modeling how its decision to change its utility function now will influence its own decisions later,

Ah, right, that abstraction thing. I'm still fairly confused by it. Maybe a simple game will help see what's going on.

The simple game can be something like a two-step choice. At time T1, the agent can send either A or B. Then at time T2, the agent can send A or B again, but its utility function might have changed in between.

For the original utility function, our payoff matrix looks like AA: 10, AB: -1, BA: 0, BB: 1. So if the utility function didn't change, the agent would just send A at time T1 and A at time T2, and get a reward of 10.

But suppose in between T1 and T2, a program predictably changes the agent's payoff matrix, as stored in memory, to AA: -1, AB: 10, BA: 0, BB: 1. Now if the agent sent A at time T1, it will send B at time T2, to claim the new payoff for AB of 10 units. Even though AB is lowest on the preference ordering of the agent at T1. So if our agent is clever, it sends B at time T1 rather than A, knowing that the future program will also pick B, leading to an outcome (BB, for a reward of 1) that the agent at T1 prefers to AB.

So, is our AIXI Agent 1 clever enough to do that?

Comment author: Anja 16 November 2012 08:46:36AM 0 points [-]

I would assume that it is not smart enough to forsee its own future actions and therefore dynamically inconsistent. The original AIXI does not allow for the agent to be part of the environment. If we tried to relax the dualism then your question depends strongly on the approximation to AIXI we would use to make it computable. If this approximation can be scaled down in a way such that it is still a good estimator for the agent's future actions, then maybe an environment containing a scaled down, more abstract AIXI model will, after a lot of observations, become one of the consistent programs with lowest complexity. Maybe. That is about the only way I can imagine right now that we would not run into this problem.

Comment author: Manfred 16 November 2012 07:17:53PM 0 points [-]

Thanks, that helps.

Comment author: timtyler 16 November 2012 12:15:06AM *  0 points [-]

Agent 1 will have trouble modeling how its decision to change its utility function now will influence its own decisions later, as described in AIXI and existential despair.

Be warned that that post made practically no sense - and surely isn't a good reference.