taw comments on The Absent-Minded Driver - Less Wrong

27 Post author: Wei_Dai 16 September 2009 12:51AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (139)

You are viewing a single comment's thread.

Comment author: taw 16 September 2009 02:49:44AM 39 points [-]

You cannot assume any α, and choose p based on it, for α depends on p. You just introduced a time loop into your example.

Comment author: pengvado 16 September 2009 04:40:07AM *  48 points [-]

Indeed, though it doesn't have to be a time loop, just a logical dependency. Your expected payoff is α[p^2+4(1-p)p] + (1-α)[p+4(1-p)]. Since you will make the same decision both times, the only coherent state is α=1/(p+1). Thus expected payoff is (8p-6p^2)/(p+1), whose maximum is at about p=0.53. What went wrong this time? Well, while this is what you should use to answer bets about your payoff (assuming such bets are offered independently at every intersection), it is not the quantity you should maximize: it double counts the path where you visit both X and Y, which involves two instances of the decision but pays off only once.

Comment author: Eliezer_Yudkowsky 16 September 2009 08:12:13AM *  7 points [-]

Mod parents WAY up! I should've tried to solve this problem on my own, but I wasn't expecting it to be solved in the comments like that!

Awesome. I'm steadily upgrading my expected utilities of handing decision-theory problems to Less Wrong.

EDIT 2016: Wei Dai below is correct, this was my first time encountering this problem and I misunderstood the point Wei Dai was trying to make.

Comment author: ciphergoth 19 September 2009 03:18:18PM 22 points [-]

You make it sound as if you expect to expect a higher utility in the future than you currently expect...

Comment author: Wei_Dai 18 September 2009 10:07:56AM 7 points [-]

The parents that you referred to are now at 17 and 22 points, which seems a bit mad to me. Spotting the errors in P&R's reasoning isn't really the problem. The problem is to come up with a general decision algorithm that both works (in the sense of making the right decisions) and (if possible) makes epistemic sense.

So far, we know that UDT works but it doesn't compute or make use of "probability of being at X" so epistemically it doesn't seem very satisfying. Does TDT give the right answer when applied to this problem? If so, how? (It's not specified formally enough that I can just apply it mechanically.) Does this problem suggest any improvements or alternative algorithms?

Awesome. I'm steadily upgrading my expected utilities of handing decision-theory problems to Less Wrong.

Again, that seems to imply that the problem is solved, and I don't quite see how the parent comments have done that.

Comment author: SilasBarta 18 September 2009 03:53:12PM 1 point [-]

The problem is to come up with a general decision algorithm that both works (in the sense of making the right decisions) and (if possible) makes epistemic sense.

I presented a solution in a comment here which I think satisfies these: It gives the right answer and consistently handles the case of "partial knowledge" about one's intersection, and correctly characterizes your epistemic condition in the absent-minded case.

Comment author: casebash 23 April 2016 10:27:32AM 0 points [-]

One way to describe this is to note that choosing the action that maximises the expectation of value is not the same as choosing that action that can be expected to produce the most value. So choosing p=0.53 maximises our expectations, not our expectation of production of value.

Comment author: Antisuji 16 September 2009 06:32:41PM 0 points [-]

Since you will make the same decision both times, the only coherent state is α=1/(p+1).

I'm curious how you arrived at this. Shouldn't it be α = (1/2)p + (1 - p) = 1 - p/2? (The other implies that you are a thirder in the Sleeping Beauty Problem -- didn't Nick Bostrum have the last word on that one?) The payoff becomes α[p^2+4p(1-p)] + (1-α)[p+4(1-p)] = (1 - p/2)(4p - 3p^2) + (p/2)(4 - 3p) = 6p - (13/2)p^2 + (3/2)p^3, which has a (local) maximum around p = 0.577.

The conclusion remains the same, of course.

Comment author: PhilGoetz 20 September 2009 07:52:00AM *  2 points [-]

alpha = 1/(p+1) because the driver is at Y p times for every 1 time the driver is at X; so times the driver is at X / (times the driver is at X or Y) = 1 / (p+1).

The problem with pengvado's calculation isn't double counting. It purports to give an expected payoff when made at X, which doesn't count the expected payoff at Y. The problem is that it doesn't really give an expected payoff. alpha purports to be the probability that you are at X; yet the calculation must be made at X, not at Y (where alpha will clearly be wrong). This means we can't speak of a "probability of being at X"; alpha simply is 1 if we use this equation and believe it gives us an expected value.

Or look at it this way: Before you introduce alpha into the equation, you can solve it and get the actual optimal value for p. Once you introduce alpha into the equation, you guarantee that the driver will have false beliefs some of the time, because alpha = 1/(p+1), and so the driver can't have the correct alpha both at X and at Y. You have added a source of error, and will not find the optimal solution.

Comment author: casebash 23 April 2016 10:44:49AM *  0 points [-]

If you want to find the value of p that leads to the optimal decision you need to look at the impact on expected value of choosing one p or another, not just consider expected value at the end. Currently, it maximises expectations, not value created, with situations where you pass through X and Y being double counted.

Comment author: pengvado 17 September 2009 12:46:27AM *  2 points [-]

I'm a "who's offering the bet"er on Sleeping Beauty (which Bostrom has said is consistent with, though not identical to, his own model). And in this case I specified bets offered and paid separately at each intersection, which corresponds to the thirder conclusion.

Comment author: Jonathan_Graehl 17 September 2009 12:05:50AM 0 points [-]

The paper covered that, but good point.