if I predict that the Predictor has only put money in one box, I will only take box B, since this will result in the accurate prediction that I will take one box, which will mean that both boxes contain money and allow me to take both boxes.
I think you gravely misunderstand the original Newcomb's setup, as well as various instantiations of it. The original setup focuses on what you will do, not how you arrive at the decision what to do. What you are suggesting is being an exception to the problem: someone so powerful, the predictor fails at predicting your actions. This is the essence of CDT two-boxing and various other ways to fight the hypothetical, that hypothetical being that you are predictable. If you posit that you can foil the predictor, the whole setup melts away.
Of course we care about the outcomes. This isn't necessarily about having perfect predictive power or outplaying the predictor, it's about winning Newcomb's problem. 3-Condition Marion, when presented with Newcomb's problem, runs the first two conditionals which is essentially a check to see how adversarial she can get away with being. If she predicted that she would be able to outgame the predictor at some point, she would take two boxes. However the Predictor is essentially perfect at its job, so the most she predicts being able to do is cause a non-halting recursion in her own decision tree, so that's no good. That cuts off the option to try and get 1,001,000 out of the Predictor and Marion settles for her second best outcome, which is two conclude she should just take Box B. The Predictor correctly predicts that Marion will employ this algorithm and only take box B, and thus fills Box B. Marion can't then decide to two-box, she's already reasoned out that there's no way for her to game the predictor.
3-Conditional Marion is interesting in part because something like adversarial play emerges from her decision algorithm simply from the fact she's trying to model the other agent and conditionally respond to her predictions of them. The other agent of course wants to satisfy its own values and block Marion from adversarially going too far, so she wants to calculate exactly how extortionary she can be before the other party defects. She can't get more than 1,000,000 out of the Predictor without losing 1,000,000 in her attempt for being too greedy and failing to cooperate. The same thing happens in Parfit's Hitchhiker.
The part about two Predictors playing against each other reminded me of Robust Cooperation in the Prisoner's Dilemma, where two agents with the algorithm "If I find a proof that the other player cooperates with me, cooperate, otherwise defect" are able to mutually prove cooperation and cooperate.
If we use that framework, Marion plays "If I find a proof that the Predictor fills both boxes, two-box, else one-box" and the Predictor plays "If I find a proof that Marion one-boxes, fill both, else only fill box A". I don't understand the math very well, but I think in this case neither agent finds a proof, and the Predictor fills only box A while Marion takes only box B - the worst possible outcome for Marion.
Marion's third conditional might correspond to Marion only searching for proofs in PA, while the Predictor searches for proofs in PA+1, in which case Marion will not find a proof, the Predictor will, and then the Predictor fills both boxes and Marion takes only box B. But in this case clearly Marion has abandoned the ability to predict the Predictor and has given the Predictor epistemic vantage over her.
"If I find a proof that the other player cooperates with me, cooperate, otherwise defect"
This would cooperate with CooperateBot (algorithm that unconditionally says "Cooperate").
Yes. The one I described is the one the paper calls FairBot. It also defines PrudentBot, which looks for a proof that the other player cooperates with PrudentBot and a proof that it defects against DefectBot. PrudentBot defects against CooperateBot.
Yeah after the first two conditionals return as non-halting, Marion effectively abandons trying to further predict the predictor. After iterating the non-halting stack, Marion will conclude that she's better served by giving into the partial blackmail and taking the million dollars then she is by trying to game the last $1000 out of the predictor, based on the fact that her ideal state is gated behind an infinitely recursed function.
I will execute this algorithm unless doing so produces non-halting states, in which case I will only take Box B since this creates a schelling point at my next most preferred worldstate.
At least two issues there:
Note: I dug around quite a bit and haven't found anyone who previously discussed this, but if it's been talked about previously somewhere links would be appreciated.
Epistemic Status: Interesting Conjecture, I think my logic is all sound but please check me
I think I discovered a toy decision theory agent that generates non-halting states under certain conditions and is capable of breaking various game theory problems in some novel ways.
Imagine an idealized agent employing what I'm informally calling conditional decision theory. Lets name her Marion. Marion goes into Newcomb's Problem with the following algorithm:
"If I Predict that the Predictor has put money in both boxes, I will take both boxes, because that would result in gaining $1,001,000, however, if I predict that the Predictor has only put money in one box, I will only take box B, since this will result in the accurate prediction that I will take one box, which will mean that both boxes contain money and allow me to take both boxes."
The Predictor wants to fill Box B conditional on its prediction that Marion doesn't also take Box A. Marion wants to take Box A conditional on her prediction that Box B is filled.
If Marion takes both boxes, The Predictor will have predicted that she takes both boxes, causing it to not fill box B. Marion knows that the Predictor will predict this, so she only takes box B. The Predictor knows Marion will only take Box B, therefore box B is filled. Marion knows this and thus takes both boxes. The Predictor knows this so it leaves box B empty. Marion knows box B will be empty and thus only takes Box B. The Predictor knows this so it fills box B. Marion knows this so she takes both boxes.
This is, as far as I can tell, non-halting under the premise that both agents have identical predictive power. Marion can't win using this algorithm, but she can keep the Predictor from winning. If Marion and The Predictor have identical predictive and reasoning capability, then the answer as to "What is the outcome?" is unfalsifiable, the retrocausal inferences recurse to infinity and neither one of them can accurately predict the other.
This only works assuming you're basically playing two Predictors against one another, in the real world the predictive errors stack up with each successive recursion until one of them falls out of equilibrium and fails to accurately predict the other.
If you add a third conditional clause to Marion you can patch this out:
""If I Predict that the Predictor has put money in both boxes, I will take both boxes, because that would result in gaining $1,001,000, however, if I predict that the Predictor has only put money in one box, I will only take box B, since this will result in the accurate prediction that I will take one box, which will mean that both boxes contain money and allow me to take both boxes. I will execute this algorithm unless doing so produces non-halting states, in which case I will only take Box B since this creates a schelling point at my next most preferred worldstate."
Lets try out Marion in Solomon's Problem:
"If I predict that chewing gum reduces throat lesions, I will chew gum, if I predict that chewing gum increases throat lesions, I will not chew gum."
Huh, this is unusually straightforward and basically cashes out to the litany of Tarski. Wonder what's up with that? Lets try her in Parfit's Hitchhiker:
"If I predict that Paul Ekman will give me a ride out of the desert, I will not pay him $100, however, if predict that he will not give me a ride out of the desert unless I pay him $100, I will pay him $100."
Paul Ekman accurates predicts that Marion won't pay him $100, so he doesn't agree to give her a ride, so she will pay him $100, so he will give her a ride, so she won't pay him $100, so he won't give her a ride. This is again, non-halting, and can be patched the same way we patched Newcomb's problem. We just say "If this produces non-halting states, move on to the best outcome that doesn't." The next best outcome for Marion to getting out of the desert for free is getting out of the desert at all, so Marion gets a ride out of the desert and pays Paul.
Alright, last one, lets test Marion against Ziz's Bear Attack
"If Evil Emperor Paul Ekman believes me when I tell him that I will bare my neck and let myself be mauled to death, then I will let myself be mauled to death. However, if Emperor Ekman doesn't believe me when I tell him that I will bare my neck, than I will run from the bear and attempt to escape."
Parable of the Dagger, throw her to the bears at once!Sure, in a real world scenario, you can't usually one up your captors with semantic games like this. However, if we're treating this as a purely Newcomblike hypothetical, then this breaks Paul's ability to predict Marion while also optimizing for what he wants (entertainment via bear maulings). If Emperor Ekman believes Marion, than he doesn't throw her to the bears because he's not entertained by it. If he doesn't believe Marion, he gets entertained, so he should want to believe that he doesn't believe Marion. However this would require him to predict her inaccurately, and Paul Ekman is famously accurate in his predictions. In the real world he can just toss that out and Parable of the Dagger her to the bears anyway "well let's test it and see" but if he's purely operating on his predictions, then he accurately predicts that Marion will just stand there and he doesn't throw her to the bears. Okay that is halting. However that's also not the scenario.
In the original scenario, logical time has already ticked forward to a place after Paul Ekman has made his predictions. In the original scenario, he already doesn't believe Marion. A typical timeless decision theory agent would not have a conditional attached to their timeless algorithm specifying that they should run if Paul Ekman doesn't believe them. This agent stands there lamenting that they wish they had a different decision theory and lets themselves be mauled to death by a bear in order to delete the counterfactual timeline.
Wait...what? That can't be right, counterfactuals are by definition not real. If you think you're living in a counterfactual timeline, you are incorrect. There is only one flow of time, you can't go back and anthropic principle a different version of yourself into existence at the expense of your current world, there is just the world.
In this scenario, Marion knows that Paul Ekman has already incorrectly predicted her, so the non-halting state created by the indeterminability of the conditional has already collapsed and her best option is to try and run from the bear. What is the difference between this scenario and Newcomb's problem?
In Newcomb's Problem, the non-halting state is collapsed by Marion concluding that taking box B alone is the best option, which means that the Predictor has filled Box B. In Ziz's Bear Attack, the non-halting state is collapsed by Paul Ekman predicting either correctly or incorrectly that Marion will bare her neck to the bear. In both scenarios, the act of deciding collapses the non-halting state, however in Newcomb's the decider is Marion, and in Ziz's its Paul Ekman, who essentially forces Marion's conditional computation into one state or another based on his prediction of her. The timeless agent is forced to iterate their decision tree in this scenario as if the non-halting state has not already collapsed. This forced entanglement means the timeless agent believes they should let themselves be eaten in order to maximize their chances of survival, despite having strong evidence for not being in a world where this would help and this being a completely incoherent position.
The Ziz's Bear Attack scenario is interesting because it presents what is to a naive reasoner clearly a bad move and calls it actually the rational thing to do, in a very similar way to how Casual theorists will insist that actually two-boxing Newcomb's problem is the rational choice. Well my goal isn't to do the rational thing it's to win! Getting eaten by a bear to avoid a counterfactual that I am already in the midst of isn't winning, it's fractally engineered losing.
So Marion with third conditional chews gum and one-boxes but still runs from bears released by evil emperors and gets rescued from the desert in exchange for paying $100. This seems really solid. I don't think conditional decision theory Marion is anything like an ideal rational agent in the sense of always winning, but she seems pretty good at getting the right outcomes on these games or else breaking them in interesting ways.
I don't know what you can do with all this, and really I just wanted to start a discussion around the topic and see if it sparked any interesting insights or if I'm wrong on something in how I'm thinking about these scenarios. I will close with this postulate: an actually ideal decision theory should let you actually win Newcomb's problem. That is, I don't believe it is logically impossible to create a decision theory which lets you reliably win $1,001,000 on Newcomb's problem.