Vladimir_Nesov comments on Towards a New Decision Theory - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (142)
Further reflecting, it looks to me like there may be an argument which forces Wei Dai's "updateless" decision theory, very much akin to the argument that I originally used to pin down my timeless decision theory - if you expect to face Counterfactual Muggings, this is the reflectively consistent behavior; a simple-seeming algorithm has been presented which generates it, so unless an even simpler algorithm can be found, we may have to accept it.
The face-value interpretation of this algorithm is a huge bullet to bite even by my standards - it amounts to (depending on your viewpoint) accepting the Self-Indication Assumption or rejecting anthropic reasoning entirely. If a coin is flipped, and on tails you will see a red room, and on heads a googolplex copies of you will be created in green rooms and one copy in a red room, and you wake up and find yourself in a red room, you would assign (behave as if you assigned) 50% posterior probability that the coin had come up tails. In fact it's not yet clear to me how to interpret the behavior of this algorithm in any epistemic terms.
To give credit where it's due, I'd only been talking with Nick Bostrom about this dilemma arising from altruistic timeless decision theorists caring about copies of themselves; the idea of applying the same line of reasoning to all probability updates including over impossible worlds, and using this to solve Drescher's(?) Counterfactual Mugging, had not occurred to me at all.
Wei Dai, you may have solved one of the open problems I named, with consequences that currently seem highly startling. Congratulations again.
Hmm... I've been talking about no-updating approach to decision-making for months, and Counterfactual Mugging was constructed specifically to show where it applies well, in a way that sounds on the surface opposite to "play to win".
The idea itself doesn't seem like anything new, just a way of applying standard expectation maximization, not to individual decisions, but to a choice of strategy as a whole, or agent's source code.
From the point of view of agent, everything it can ever come to know results from computations it runs with its own source code, that take into account interaction with environment. If the choice of strategy doesn't depend on particular observations, on context-specific knowledge about environment, then the only uncertainty that remains is the uncertainty about what the agent itself is going to do (compute) according to selected strategy. In simple situations, uncertainty disappears altogether. In more real-world situations, uncertainty results from there being a huge number of possible contexts in which the agent could operate, so that when the agent has to calculate its action in each such context, it can't know for sure what it's going to calculate in other contexts, while that information is required for the expected utility calculation. That's logical uncertainty.