Vaniver comments on Open Thread, November 16–30, 2012 - Less Wrong

3 Post author: VincentYu 18 November 2012 01:59PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (213)

You are viewing a single comment's thread. Show more comments above.

Comment author: aaronde 25 November 2012 05:12:58AM *  0 points [-]

What I am saying is that I don't assume that I maximize expected utility. I take the five-and-ten problem as a proof that an agent cannot be certain that it will make the optimal choice, while it is choosing, because this leads to a contradiction. But this doesn't mean that I can't use the evidence that a choice would represent, while choosing. In this case, I can tell that U($10) > U($5) directly, so conditioning on A=$10 or A=$5 is redundant. The point is that it doesn't cause the algorithm to blow up, as long I don't think my probability of maximizing utility is 0 or 1.

It's true that A=$5 could be stronger evidence for U($5)>U($10) than A=$10 is for U($10)>U($5). But there's no particular reason to think it would be. And as long as P(U($10)>U($5)) is large enough a priori, it will swamp out the difference. As long as making a choice is evidence for that being the optimal choice, only insofar as I am confident that I make the optimal choice in general, it will provide equally strong evidence for every choice, and cancel itself out. But in cases where a particular choice is evidence of good things for other reasons (like Newcomb's problem), taking this evidence into consideration can affect my decision.

So why can't I just use the knowledge that I'll go through this line of reasoning to prove that I will choose $10 and yield a contradiction? Because I can't prove that I'll go through this line of reasoning. Simulating my decision process as part of my decision would result in infinite recursion. Now, there may be a shortcut I could use to prove what my choice will be, but the very fact that this would yield a contradiction means that no such proof exists in a consistent formal system.

(BTW, I agree that CDT is the only decision theory that works in practice, as is. I'm only addressing one issue with the various timeless decision theories)

Comment author: Vaniver 25 November 2012 06:57:13AM *  1 point [-]

And as long as P(U($10)>U($5)) is large enough a priori, it will swamp out the difference.

Well, then why even update? (Or, more specifically, why assume that this is harmless normally, but an ace up your sleeve for a particular class of problems? You need to be able to reliably distinguish when this helps you and when this hurts you from the inside, which seems difficult.)

Because I can't prove that I'll go through this line of reasoning. Simulating my decision process as part of my decision would result in infinite recursion.

I'm not sure that I understand this; I'm under the impression that many TDT applications require that they be able to simulate themselves (and other TDT reasoners) this way.

Comment author: aaronde 25 November 2012 11:44:51PM 0 points [-]

Good questions. I don't know the answers. But like you say, UDT especially is basically defined circularly - where the agent's decision is a function of itself. Making this coherent is still an unsolved problem. So I was wondering if we could get around some of the paradoxes by giving up on certainty.