SilasBarta comments on The Absent-Minded Driver - Less Wrong

27 Post author: Wei_Dai 16 September 2009 12:51AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (139)

You are viewing a single comment's thread. Show more comments above.

Comment author: SilasBarta 16 September 2009 08:17:26PM 0 points [-]

Thanks! :-) But I still don't understand what made you express the payoff as a function of p. Was it just something you thought of when applying UDT (perhaps after knowing that's how someone else approached the problem), or is there something about UDT that required you to do that?

Comment author: Vladimir_Nesov 16 September 2009 08:29:00PM *  0 points [-]

What do you mean? p is the only control parameter... You consider a set of "global" mixed strategies, indexed by p, and pick one that leads to the best outcome, without worrying about where your mind that does this calculation is currently located and under what conditions you are thinking this thought.

Comment author: SilasBarta 16 September 2009 08:36:02PM 0 points [-]

What do you mean? p is the only control parameter...

Perhaps, but it's an innovation to think of the problem in terms of "solving for the random fraction of times I'm going to do them". That is, even considering that you should add randomness in between your decision and what you do, is an insight. What focused your attention on optimizing with respect to p?

Comment author: Vladimir_Nesov 16 September 2009 08:42:53PM *  1 point [-]

Mixed strategy is a standard concept, so here we are considering a set S of all (global) mixed strategies available for the game. When you are searching for the best strategy, you are maximizing the payoff over S. You are searching for the mixed strategy that gives the best payoff. What UDT tells is that you should just do that, even if you are considering what to do in a situation where some of the options have run out, and, as here, even if you have no idea where you are. "The best strategy" quite literally means

The only parameter for a given strategy is the probability of turning, so it's natural to index the strategies by that probability. This indexing is a mapping t:[0,1]->S that places a mixed strategy in correspondence with a value of turning probability. Now, we can rewrite the expected utility maximization in terms of probability:

For a strategy corresponding to turning probability p, it's easy to express corresponding expected utility:

We now can find the optimal strategy as

Comment author: SilasBarta 16 September 2009 10:11:47PM 1 point [-]

Okay, that's making more sense -- the part where you get to parameterizing p as a real is what I was interested in.

But do you do the same thing when applying UDT to Newcomb's problem? Do you consider it a necessary part of UDT that you take p (with 0<=p<=1) as a continuous parameter to maximize over, where p is the probability of one-boxing?

Comment author: Vladimir_Nesov 17 September 2009 02:40:49AM *  1 point [-]

Fundamentally, this depends on the setting -- you might not be given a random number generator (randomness is defined with respect to the game), and so the strategies that depend on a random value won't be available in the set of strategies to choose from. In Newcomb's problem, the usual setting is that you have to be fairly deterministic or Omega punishes you (so that a small probability of two-boxing may even be preferable to pure one-boxing, or not, depending on Omega's strategy), or Omega may be placed so that your strategy is always deterministic for it (effectively, taking mixed strategies out of the set of allowed ones).

Comment author: Wei_Dai 16 September 2009 08:25:13PM 0 points [-]

S() is suppose to be an implementation of UDT. By looking at the world program P, it should determine that among all possible input-output mappings, those that return "EXIT" for 1/3 of all inputs (doesn't matter which ones) maximize average payoff. What made me express the payoff as a function of p is by stepping through what S is supposed to do as an implementation of UDT.

Does that make sense?

Comment author: SilasBarta 16 September 2009 08:33:00PM 0 points [-]

I'm still confused. Your response seems to just say, "I did it because it works." -- which is a great reason! But I want to know if UDT gave you more guidance than that.

Does UDT require that you look at the consequences of doing something p% of the time (irrespective of which ones), on all problems?

Basically, I'm in the position of that guy/gal that everyone here probably helped out in high school:

"How do you do the proof in problem 29?" "Oh, just used identities 3 and 5, solve for t, and plug it back into the original equation." "But how did you know to do that?"

Comment author: Wei_Dai 16 September 2009 08:46:05PM 0 points [-]

Does UDT require that you look at the consequences of doing something p% of the time (irrespective of which ones), on all problems?

No, UDT (at least in my formulation) requires that you look at all possible input-output mappings, and choose the one that is optimal. In this case it so happens that any function that returns "EXIT" for 1/3 of inputs is optimal.