Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
New Comment
6 comments, sorted by Click to highlight new comments since: Today at 12:13 PM

UDT takes bet 2.

Can you put your flavor of EDT in clear conflict with UDT? Or are they equivalent?

If you need a rigorous formulation of proof-based UDT, this old post of mine might be helpful. Feel free to ask if anything isn't clear.

Thanks for the link! What I don't understand is how this works in the context of empirical and logical uncertainty. Also, it's unclear to me how this approach relates to Bayesian conditioning. E.g. if the sentence "if a holds, than o holds" is true, doesn't this also mean that P(o|a)=1? In that sense, proof-based UDT would just be an elaborate specification of how to assign these conditional probabilities "from the viewpoint of the original position", so with updatelessness, and in the context of full logical inference and knowledge of the world, including knowledge about one's own decision algorithm. I see how this is useful, but don't understand how it would at any point contradict normal Bayesian conditioning.

As to your first question: if we ignore problems that involve updatelessness (or if we just stipulate that EDT always had the opportunity to precommit), I haven't been able to find any formally specified problems where EDT and UDT diverge.

I think Caspar Oesterheld's and my flavor of EDT would be ordinary EDT with some version of updatelessness. I'm not sure if this works, but if it turns out to be identical to UDT, then I'm not sure which of the two is the better specified or easier to formalize one. According to the language in Arbital's LDT article, my EDT would differ from UDT only insofar as instead of some logical conditioning, we use ordinary Bayesian conditioning. So (staying in the Arbital framework), it could look something like this (P stands for whatever prior probability distribution you care about):

Also, it's unclear to me how this approach relates to Bayesian conditioning.

To me, proof-based UDT is a simple framework that includes probabilistic/Bayesian reasoning as a special case. For example, if the world is deterministic except for a single coinflip, you specify a preference ordering on pairs of outcomes of two deterministic worlds. Fairness or non-fairness of the coinflip will be encoded into the ordering, so the decision can be based on completely deterministic reasoning. All probabilistic situations can be recast in this way. That's what UDT folks mean by "probability as caring".

It's really cool that UDT lets you encode any setup with probability, prediction, precommitment etc. into a few (complicated and self-referential) sentences in PA or GL that are guaranteed to have truth values. And since GL is decidable, you can even write a program that will solve all such problems for you.

I'll cite the thought experiment for the reference:

Betting on the Past: In my pocket (says Bob) I have a slip of paper on which is written a proposition P. You must choose between two bets. Bet 1 is a bet on P at 10:1 for a stake of one dollar. Bet 2 is a bet on P at 1:10 for a stake of ten dollars. So your pay-offs are as follows: Bet 1, P is true: 10; Bet 1, P is false: -1; Bet 2, P is true: 1; Bet 2, P is false: -10. Before you choose whether to take Bet 1 or Bet 2 I should tell you what P is. It is the proposition that the past state of the world was such as to cause you now to take Bet 2. [Ahmed 2014, p. 120]

Some comments on your post:

Alice is betting on a past state of the world. She can’t causally influence the past, and she’s uncertain whether the proposition is true or not.

More precisely, Alice is betting on implications of the past state of the world, on what it means about the future, or perhaps on what it causes the future to be. Specifically Alice's action, an implication of the past state of the world. If we say that Alice can causally influence her own action, it's fair to say that Alice can causally influence the truth of the proposition, even if she can't causally influence the state of the past. So she can't influence the state of the past, but can influence implications of the state of the past, such as her own action. Similarly, a decision algorithm can't influence its own code, but can influence the result it computes. (So I'm not even sure what CDT is supposed to do here, since it's not clear that the bet is really on the past state of the world and not on truth of a proposition about the future state of the world.)

Perhaps if the bet was about the state of the world yesterday, LDT would still take Bet 2. Clearly, LDT’s algorithm already existed yesterday, and it can influence this algorithm’s output; so if it chooses Bet 2, it can change yesterday’s world and make the proposition true.

It's better to avoid the idea of "change" in this context. Change always compares alternatives, but for UDT, there is no default state of the world before-decision-is-made, there are only alternative states of the world following the alternative decisions. So a decision doesn't change things from the way they were before it's made to the way after it's made, at most you can compare how things are after one possible decision to how things are after the other possible decision.

Given that, I don't see what role "LDT’s algorithm already existed yesterday" plays here, and I think it's misleading to state that "it can change yesterday’s world and make the proposition true". Instead it can make the proposition true without changing yesterday’s world, by ensuring that yesterday’s world was always such that the proposition is true. There is no change, yesterday’s world was never different and the proposition was never false. What changed (in our observation of the decision making process) is the state of knowledge about yesterday’s world, from uncertainty about the truth of the proposition to knowledge that it's true.

If we choose a more distant point in the past as a reference for Alice’s bet – maybe as far back as the birth of our universe – she’ll eventually be unable to exert any possible influence via logical counterfactuals.

Following from the preceding point, it doesn't matter when the past state of the world is, since we are not trying to influence it, we are instead trying to influence its consequences, which are in the future. There is something unusual about influencing consequences of a construction without influencing the construction itself, but it helps to recall that it's exactly what any program does, when it influences its actions without influencing its code. It's what a human emulation in a computer does, by making decisions without changing the initial image of their brain from which the running emulation was loaded. And it's also what a regular human running inside physics without any emulation does.

Thanks a lot for your elaborate reply!

(So I'm not even sure what CDT is supposed to do here, since it's not clear that the bet is really on the past state of the world and not on truth of a proposition about the future state of the world.)

Hmm, good point. The truth of the proposition is evaluated on basis of Alice's action, which she can causally influence. But we could think of a Newcomblike scenario in which someone made a perfect prediction a 100 years ago and put down a note about what state the world was in at that time. Now instead of checking Alice's action, we just check this note to evaluate whether the proposition is true. I think then it's clear that CDT would "two-box".

Given that, I don't see what role "LDT’s algorithm already existed yesterday" plays here, and I think it's misleading to state that "it can change yesterday’s world and make the proposition true". Instead it can make the proposition true without changing yesterday’s world, by ensuring that yesterday’s world was always such that the proposition is true. There is no change, yesterday’s world was never different and the proposition was never false.

Sorry for the fuzzy wording! I agree that "change" is not a good terminology. I was thinking about TDT and a causal graph. In that context, it might have made sense to say that TDT can "determine the output" of the decision nodes, but not that of the nature nodes that have a causal influence on the decision nodes?

Following from the preceding point, it doesn't matter when the past state of the world is, since we are not trying to influence it, we are instead trying to influence its consequences, which are in the future.

OK, if I interpret that correctly, you would say that our proposition is also a program that references Alice's decision algorithm, and hence we can just determine that program's output the same way we can determine our own decision. I am totally fine with that. If we can expand this principle to all the programs that somehow reference our decision algorithms, I would be curious whether there are still differences left between this and evidential counterfactuals.

Take the thought experiment in this post, for instance: Imagine you're an agent that always chooses the action "take the red box". Now there is a program that checks whether there will be cosmic rays, and if so, then it changes your decision algorithm to one that outputs "take the green box". Of course, you can still "influence" your output like all regular humans, and you can thus in some sense also influence the output of the program that changed you. By extension, you can even influence whether or not the output of the program "outer space" is "gamma rays" or "no gamma rays". If I understand your answers to my Coin Flip Creation post correctly, this formulation would make the problem into a kind of anthropic problem again, where the algorithm would at one point "choose to output red" in order to be instantiated into the world without gamma rays. Would you agree with this, or did I get something wrong?

Is there a better formulation for this? Because I don't see how is this a "problem".

Assuming Bob is truthful, Alice faces no bets. She can choose one of two courses of action and each of them has a predetermined outcome known to her. There is no uncertainty involved.