Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: Stuart_Armstrong 23 September 2017 08:28:36PM *  2 points [-]

Have you seen the "XOR Blackmail" in the Death in Damascus paper? That's a much better problem with EDT than the smoking lesion problem, in my view. And it's simple to describe:

An agent has been alerted to a rumor that her house has a terrible termite infestation, which would cost her $1,000,000 in damages. She doesn’t know whether this rumor is true. A greedy and accurate predictor with a strong reputation for honesty has learned whether or not it’s true, and drafts a letter:

I know whether or not you have termites, and I have sent you this letter iff exactly one of the following is true: (i) the rumor is false, and you are going to pay me $1,000 upon receiving this letter; or (ii) the rumor is true, and you will not pay me upon receiving this letter.

The predictor then predicts what the agent would do upon receiving the letter, and sends the agent the letter iff exactly one of (i) or (ii) is true. Thus, the claim made by the letter is true. Assume the agent receives the letter. Should she pay up?

Comment author: Johannes_Treutlein 26 September 2017 10:38:31AM *  2 points [-]

EDT doesn't pay if it is given the choice to commit to not paying ex-ante (before receiving the letter). So the thought experiment might be an argument against ordinary EDT, but not against updateless EDT. If one takes the possibility of anthropic uncertainty into account, then even ordinary EDT might not pay the blackmailer. See also Abram Demski's post about the Smoking Lesion. Ahmed and Price defend EDT along similar lines in a response to a related thought experiment by Frank Arntzenius.

Comment author: Johannes_Treutlein 06 June 2017 07:31:44AM *  1 point [-]

Imagine that Omega tells you that it threw its coin a million years ago, and would have turned the sky green if it had landed the other way. Back in 2010, I wrote a post arguing that in this sort of situation, since you've always seen the sky being blue, and every other human being has also always seen the sky being blue, everyone has always had enough information to conclude that there's no benefit from paying up in this particular counterfactual mugging, and so there hasn't ever been any incentive to self-modify into an agent that would pay up ... and so you shouldn't.

I think this sort of reasoning doesn't work if you also have a precommitment regarding logical facts. Then you know the sky is blue, but you don't know what that implies. When Omega informs you about the logical connection between sky color, your actions, and your payoff, then you won't update on this logical fact. This information is one implication away from the logical prior you precommitted yourself to. And the best policy given this prior, which contains information about sky color, but not about this blackmail, is not to pay: not paying will a priori just change the situation in which you will be blackmailed (hence, what blue sky color means), but not the probability of a positive intelligence explosion in the first place. Knowing or not knowing the color of the sky doesn't make a difference, as long as we don't know what it implies.

(HT Lauro Langosco for pointing this out to me.)

[Link] Anthropic uncertainty in the Evidential Blackmail problem

4 Johannes_Treutlein 14 May 2017 04:43PM
Comment author: Vladimir_Nesov 07 February 2017 03:18:18AM *  1 point [-]

A way around this would be if you’re not completely updateless, but if you instead have already updated on the fact that you do exist.

It's not a given that you can easily observe your existence. From updateless point of view, all possible worlds, or theories of worlds, or maybe finite fragments of reasoning about them, in principle "exist" to some degree, in the sense of being data potentially relevant for estimating the value of everything, which is something to be done for the strategies under agent's consideration. So in case of worlds, or instances of the agent in worlds, the useful sense of "existence" is relevance for estimating the value of everything (or of change in value depending on agent's strategy, which is the sense in which worlds that couldn't contain or think about the agent, don't exist). Since in this case we are talking about possible worlds, they do or don't exist in the sense of having no measure (probability) in the updateless prior (to the extent that it makes sense to talk about the decision algorithm using a prior). In this sense, observing one's existence means observing an argument about the a priori probability of the world you inhabit. In a world that has relatively tiny a priori probability, you should be able to observe your own (or rather the world's) non-existence, in the same sense.

This also follows the principle of reducing concepts like existence or probability (where they make sense) to components of the decision algorithm, and abandoning them in sufficiently unusual thought experiments (where they may fail to make sense, but where it's still possible to talk about decisions). See also this post of Vadim's and the idea of cognitive reductions (looking for the role a concept plays in your thinking, not just for what it could match in the world).

Comment author: Johannes_Treutlein 25 February 2017 09:13:10PM *  0 points [-]

Thanks for the reply and all the useful links!

It's not a given that you can easily observe your existence.

It took me a while to understand this. Would you say that for example in the Evidential Blackmail, you can never tell whether your decision algorithm is just being simulated or whether you're actually in the world where you received the letter, because both times, the decision algorithms receive exactly the same evidence? So in this sense, after updating on receiving the letter, both worlds are still equally likely, and only via your decision do you find out which of those worlds are the simulated ones and which are the real ones. One can probably generalize this principle: you can never differentiate between different instantiations of your decision algorithm that have the same evidence. So when you decide what action to output conditional on receiving some sense data, you always have to decide based on your prior probabilities. Normally, this works exactly as if you would first update on this sense data and then decide. But sometimes, e.g. if your actions in one world make a difference to the other world via a simulation, then it makes a difference. Maybe if you assign anthropic probabilities to either being a "logical zombie" or the real you, then the result would be like UDT even with updating?

What I still don't understand is how this motivates updatelessness with regard to anthropic probabilities (e.g. if I know that I have a low index number, or in Psy Kosh's problem, if I already know I'm the decider). I totally get how it makes sense to precommit yourself and how one should talk about decision problems instead of probabilities, how you should reason as if you're all instantiations of your decision algorithm at once, etc. Also, intuitively I agree with sticking with the priors. But somehow I can't get my head around what exactly is wrong about the update. Why is it wrong to assign more "caring energy" to the world in which some kind of observation that I make would have been more probable? Is it somehow wrong that it "would have been more probable"? Did I choose the wrong reference classes? Is it because in these problems, too, the worlds influence each other, so that you have to consider the impact that your decision would have on the other world as well?

Edit: Never mind, I think http://lesswrong.com/lw/jpr/sudt_a_toy_decision_theory_for_updateless/ kind of answers my question :)

Comment author: ProofOfLogic 02 February 2017 10:52:28PM 1 point [-]

I find this and the smoker's lesion to have the same flaw, namely: it does not make sense to me to both suppose that the agent is using EDT, and suppose some biases in the agent's decision-making. We can perhaps suppose that (in both cases) the agent's preferences are what is affected (by the genes, or by the physics). But then, shouldn't the agent be able to observe this (the "tickle defense"), at least indirectly through behavior? And won't this make it act as CDT would act?

But: I find the blackmail letter to be a totally compelling case against EDT.

Comment author: Johannes_Treutlein 24 February 2017 10:08:57AM *  1 point [-]

I agree with all of this, and I can't understand why the Smoking Lesion is still seen as the standard counterexample to EDT.

Regarding the blackmail letter: I think that in principle, it should be possible to use a version of EDT that also chooses policies based on a prior instead of actions based on your current probability distribution. That would be "updateless EDT", and I think it wouldn't give in to Evidential Blackmail. So I think rather than an argument against EDT, it's an argument in favor of updatelessness.

Comment author: cousin_it 09 February 2017 07:27:37PM *  2 points [-]

UDT takes bet 2.

Can you put your flavor of EDT in clear conflict with UDT? Or are they equivalent?

If you need a rigorous formulation of proof-based UDT, this old post of mine might be helpful. Feel free to ask if anything isn't clear.

Comment author: Johannes_Treutlein 24 February 2017 09:44:16AM *  1 point [-]

Thanks for the link! What I don't understand is how this works in the context of empirical and logical uncertainty. Also, it's unclear to me how this approach relates to Bayesian conditioning. E.g. if the sentence "if a holds, than o holds" is true, doesn't this also mean that P(o|a)=1? In that sense, proof-based UDT would just be an elaborate specification of how to assign these conditional probabilities "from the viewpoint of the original position", so with updatelessness, and in the context of full logical inference and knowledge of the world, including knowledge about one's own decision algorithm. I see how this is useful, but don't understand how it would at any point contradict normal Bayesian conditioning.

As to your first question: if we ignore problems that involve updatelessness (or if we just stipulate that EDT always had the opportunity to precommit), I haven't been able to find any formally specified problems where EDT and UDT diverge.

I think Caspar Oesterheld's and my flavor of EDT would be ordinary EDT with some version of updatelessness. I'm not sure if this works, but if it turns out to be identical to UDT, then I'm not sure which of the two is the better specified or easier to formalize one. According to the language in Arbital's LDT article, my EDT would differ from UDT only insofar as instead of some logical conditioning, we use ordinary Bayesian conditioning. So (staying in the Arbital framework), it could look something like this (P stands for whatever prior probability distribution you care about):

Comment author: Vladimir_Nesov 08 February 2017 10:00:04AM *  2 points [-]

I'll cite the thought experiment for the reference:

Betting on the Past: In my pocket (says Bob) I have a slip of paper on which is written a proposition P. You must choose between two bets. Bet 1 is a bet on P at 10:1 for a stake of one dollar. Bet 2 is a bet on P at 1:10 for a stake of ten dollars. So your pay-offs are as follows: Bet 1, P is true: 10; Bet 1, P is false: -1; Bet 2, P is true: 1; Bet 2, P is false: -10. Before you choose whether to take Bet 1 or Bet 2 I should tell you what P is. It is the proposition that the past state of the world was such as to cause you now to take Bet 2. [Ahmed 2014, p. 120]

Some comments on your post:

Alice is betting on a past state of the world. She can’t causally influence the past, and she’s uncertain whether the proposition is true or not.

More precisely, Alice is betting on implications of the past state of the world, on what it means about the future, or perhaps on what it causes the future to be. Specifically Alice's action, an implication of the past state of the world. If we say that Alice can causally influence her own action, it's fair to say that Alice can causally influence the truth of the proposition, even if she can't causally influence the state of the past. So she can't influence the state of the past, but can influence implications of the state of the past, such as her own action. Similarly, a decision algorithm can't influence its own code, but can influence the result it computes. (So I'm not even sure what CDT is supposed to do here, since it's not clear that the bet is really on the past state of the world and not on truth of a proposition about the future state of the world.)

Perhaps if the bet was about the state of the world yesterday, LDT would still take Bet 2. Clearly, LDT’s algorithm already existed yesterday, and it can influence this algorithm’s output; so if it chooses Bet 2, it can change yesterday’s world and make the proposition true.

It's better to avoid the idea of "change" in this context. Change always compares alternatives, but for UDT, there is no default state of the world before-decision-is-made, there are only alternative states of the world following the alternative decisions. So a decision doesn't change things from the way they were before it's made to the way after it's made, at most you can compare how things are after one possible decision to how things are after the other possible decision.

Given that, I don't see what role "LDT’s algorithm already existed yesterday" plays here, and I think it's misleading to state that "it can change yesterday’s world and make the proposition true". Instead it can make the proposition true without changing yesterday’s world, by ensuring that yesterday’s world was always such that the proposition is true. There is no change, yesterday’s world was never different and the proposition was never false. What changed (in our observation of the decision making process) is the state of knowledge about yesterday’s world, from uncertainty about the truth of the proposition to knowledge that it's true.

If we choose a more distant point in the past as a reference for Alice’s bet – maybe as far back as the birth of our universe – she’ll eventually be unable to exert any possible influence via logical counterfactuals.

Following from the preceding point, it doesn't matter when the past state of the world is, since we are not trying to influence it, we are instead trying to influence its consequences, which are in the future. There is something unusual about influencing consequences of a construction without influencing the construction itself, but it helps to recall that it's exactly what any program does, when it influences its actions without influencing its code. It's what a human emulation in a computer does, by making decisions without changing the initial image of their brain from which the running emulation was loaded. And it's also what a regular human running inside physics without any emulation does.

Comment author: Johannes_Treutlein 22 February 2017 01:50:30PM *  1 point [-]

Thanks a lot for your elaborate reply!

(So I'm not even sure what CDT is supposed to do here, since it's not clear that the bet is really on the past state of the world and not on truth of a proposition about the future state of the world.)

Hmm, good point. The truth of the proposition is evaluated on basis of Alice's action, which she can causally influence. But we could think of a Newcomblike scenario in which someone made a perfect prediction a 100 years ago and put down a note about what state the world was in at that time. Now instead of checking Alice's action, we just check this note to evaluate whether the proposition is true. I think then it's clear that CDT would "two-box".

Given that, I don't see what role "LDT’s algorithm already existed yesterday" plays here, and I think it's misleading to state that "it can change yesterday’s world and make the proposition true". Instead it can make the proposition true without changing yesterday’s world, by ensuring that yesterday’s world was always such that the proposition is true. There is no change, yesterday’s world was never different and the proposition was never false.

Sorry for the fuzzy wording! I agree that "change" is not a good terminology. I was thinking about TDT and a causal graph. In that context, it might have made sense to say that TDT can "determine the output" of the decision nodes, but not that of the nature nodes that have a causal influence on the decision nodes?

Following from the preceding point, it doesn't matter when the past state of the world is, since we are not trying to influence it, we are instead trying to influence its consequences, which are in the future.

OK, if I interpret that correctly, you would say that our proposition is also a program that references Alice's decision algorithm, and hence we can just determine that program's output the same way we can determine our own decision. I am totally fine with that. If we can expand this principle to all the programs that somehow reference our decision algorithms, I would be curious whether there are still differences left between this and evidential counterfactuals.

Take the thought experiment in this post, for instance: Imagine you're an agent that always chooses the action "take the red box". Now there is a program that checks whether there will be cosmic rays, and if so, then it changes your decision algorithm to one that outputs "take the green box". Of course, you can still "influence" your output like all regular humans, and you can thus in some sense also influence the output of the program that changed you. By extension, you can even influence whether or not the output of the program "outer space" is "gamma rays" or "no gamma rays". If I understand your answers to my Coin Flip Creation post correctly, this formulation would make the problem into a kind of anthropic problem again, where the algorithm would at one point "choose to output red" in order to be instantiated into the world without gamma rays. Would you agree with this, or did I get something wrong?

[Link] “Betting on the Past” – a decision problem by Arif Ahmed

2 Johannes_Treutlein 07 February 2017 09:14PM
Comment author: Johannes_Treutlein 03 February 2017 10:53:46AM 0 points [-]

CDT, TDT, and UDT would not give away the money because there is no causal (or acausal) influence on the number of universes.

I'm not so sure about UDT's response. From what I've heard, depending on the exact formal implementation of the problem, UDT might also pay the money? If your thought experiment works via a correlation between the type of universe you live in and the decision theory you employ, then it might be a similar problem to the Coin Flip Creation. I introduced the latter decision problem in an attempt to make a less ambiguous version of the Smoking Lesion. In a comment in response to my post, cousin_it writes:

Here's why I think egoistic UDT would one-box. From the problem setup it's provable that one-boxing implies finding money in box A. That's exactly the information that UDT requires for decision making ("logical counterfactual"). It doesn't need to deduce unconditionally that there's money in box A or that it will one-box.

One possible confounder in your thought experiment is the agent’s altruism. The agent doesn’t care about which world he lives in, but only about which worlds exist. If you reason from an “updateless”, outside perspective (like Anthropic Decision Theory), it then becomes irrelevant what you choose. This is because if you act in a way that’s only logically compatible with world A, you know you just wouldn’t have existed in the other world. A way around this would be if you’re not completely updateless, but if you instead have already updated on the fact that you do exist. In this case you’d have more power with your decision. “One-boxing” might also make sense if you're just a copy-egoist and prefer to live in world A.

Comment author: cousin_it 26 January 2017 05:39:30PM 1 point [-]

I can only give a clear-cut answer if you reformulate the smoking lesion problem in terms of Omega and specify the UDT agent's egoism or altruism :-)

Comment author: Johannes_Treutlein 30 January 2017 01:50:44PM 0 points [-]

That's what I was trying to do with the Coin Flip Creation :) My guess: once you specify the Smoking Lesion and make it unambiguous, it ceases to be an argument against EDT.

View more: Next