manticangel - LessWrong

"Verbal arguments for one-boxing are easy to come by, what's hard is developing a good decision theory that one-boxes"

First, the problem needs a couple ambiguities resolved, so we'll use three assumptions: A) You are making this decision based on a deterministic, rational philosophy (no randomization, external factors, etc. can be used to make your decision on the box) B) Omega is in fact infallible C) Getting more money is the goal (i.e. we are excluding decision-makers which would prefer to get less money, and other such absurdities)

Changing any of these results in a different game (either one that depends on how Omega handles random strategies, or one which depends on how often Omega is wrong - and we lack information on either)

Second, I'm going to reframe the problem a bit: Omega comes to you and has you write a decision-making function. He will evaluate the function, and populate Box B according to his conclusions on what the function will result in. The function can be self-modifying, but must complete in finite time. You are bound to the decision made by the actual execution of this function.

I can't think of any argument as to why this reframing would produce different results, given both Assumptions A and B as true. I feel this is a valid reframing because, if we assume Omega is in fact infallible, I don't see this as being any different from him evaluating the "actual" decision making function that you would use in the situation. Certainly, you're making a decision that can be expressed logically, and presumably you have the ability to think about the problem and modify your decision based on that contemplation (i.e. you have a decision-making function, and it can be self-modifying). If your decision function is somehow impossible to render mathematically, then I'd argue that Assumption A has been violated and we are, once again, playing a different game. If your decision function doesn't halt in finite time, then your payoff is guaranteed to be $0, since you will never actually take either box >.>

Given this situation, the AI simply needs to do two things: Identify that the problem is Newcombian and then identify some function X that produces the maximum expected payoff.

Identifying the problem as Newcombian should be trivial, since "awareness that this is a Newcombian problem" is a requirement of it being a Newcombian problem (if Omega didn't tell you what was in the boxes, it would be a different game, neh?)

Identifying the function X is well beyond my programming ability, but I will assert definitively that there is no function that produces a highe expected payoff than f(Always One-Box). If I am proven wrong, I dare say the person writing that proof will probably be able to cash in to a rather significant payoff :)

Keep in mind that the decision function can self-modify, but Omega can also predict this. The function "commit to One-Box until Omega leaves, then switch to Two-Box because it'll produce a higher gain now that Omega has made his prediction" would, obviously, have Omega conclude you'll be Two-Boxing and leave you with $0.

I honestly cannot find anything about this that would be overly difficult to program, assuming you already had an AI that could handle game theory problems (I'm assuming said AI is very, very difficult, and is certainly beyond my ability).

Given this reframing, f(Always One-Box) seems like a fairly trivial solution, and neither paradoxical nor terribly difficult to represent mathematically... I'm going to assume I'm missing something, since this doesn't seem to be the concensus conclusion at all, but since neither me nor my friend can figure out any faults, I'll go ahead and make this my first post on LessWrong and hope that it's not buried in obscurity due to this being a 2 year old thread :)

LESSWRONG
LW

Posts

Wikitag Contributions

Comments