ThrustVectoring comments on The Absent-Minded Driver - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (139)
It's kind of an old thread, but I know people browse the recently posted list and I have a good enough understanding of what exactly the decision theorists are doing wrong that I can explain it in plain English.
First of all, alpha can only consistently be one number: 1/(1+p). And once you substitute that into α[p2+4(1-p)p] + (1-α)[p+4(1-p)], you get a peculiar quantity: (2/1+p) * [p2 + 4(-1p)p]. Where does the 2/1+p come from? Well, every time you go through the first node, you add up the expected result from the first node and the second node, and you also add up the expected result when you visit the second node. This is a double counting. The 2/1+p is the number of times you visit a node and make a choice, per time you enter the whole thing.
If we assume we're limited to a certain number of experimental runs and not a certain number of continue/exit choices, we need to divide out the 2/(1+p) to compensate for the fact that you make N*(2/1+p) choices in N games.