You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Simulation argument meets decision theory

14 pallas 24 September 2014 10:47AM

Person X stands in front of a sophisticated computer playing the decision game Y which allows for the following options: either press the button "sim" or "not sim". If she presses "sim", the computer will simulate X*_1, X*_2, ..., X*_1000 which are a thousand identical copies of X. All of them will face the game Y* which - from the standpoint of each X* - is indistinguishable from Y. But the simulated computers in the games Y* don't run simulations. Additionally, we know that if X presses "sim" she receives a utility of 1, but "not sim" would only lead to 0.9. If X*_i (for i=1,2,3..1000)  presses "sim" she receives 0.2, with "not sim" 0.1. For each agent it is true that she does not gain anything from the utility of another agent despite the fact she and the other agents are identical! Since all the agents are identical egoists facing the apparently same situation, all of them will take the same action.  

Now the game starts. We face a computer and know all the above. We don't know whether we are X or any of the X*'s, should we now press "sim" or "not sim"?

 

EDIT: It seems to me that "identical" agents with "independent" utility functions were a clumsy set up for the above question, especially since one can interpret it as a contradiction. Hence, it might be better to switch to identical egoists whereas each agent only cares about her receiving money (linear monetary value function). If X presses "sim" she will be given 10$ (else 9$) in the end of the game; each X* who presses "sim" receives 2$ (else 1$), respectively. Each agent in the game wants to maximize the expected monetary value they themselves will hold in their own hand after the game. So, intrinsically, they don't care how much money the other copies make. 
To spice things up: What if the simulation will only happen a year later? Are we then able to "choose" which year it is?

Real-world Newcomb-like Problems

14 SilasBarta 25 March 2011 08:44PM

Elaboration of: A point I’ve made before.

 

Summary: I phrase a variety of realistic dilemmas so as to show how they’re similar to Newcomb’s problem.

 

Problem: Many LW readers don't understand why we bother talking about obviously-unrealistic situations like Counterfactual Mugging or Newcomb's problem.  Here I'm going to put them in the context of realistic dilemmas, identifying the common thread, so that the parallels are clear and you can see how Counterfactual Mugging et al. are actually highlighting relevant aspects of real-world problems -- even though they may do it unrealistically.

 

The common thread across all the Newcomblike problems I will list is this: "You would not be in a position to enjoy a larger benefit unless you would cause [1] a harm to yourself within particular outcome branches (including bad ones)."  Keep in mind that a “benefit” can include probabilistic ones (so that you don’t always get the benefit by having this propensity).  Also, many of the relationships listed exist because your decisions are correlated with others’.

 

Without further ado, here is a list of both real and theoretical situations, in rough order from most to least "real-world"ish:

 

Natural selection: You would not exist as an evolution-constructed mind unless you would be willing to cause the spreading of your genes at the expense of your life and leisure. (I elaborate here.)

 

Expensive punishment: You would not be in the position of enjoying a crime level this low unless you would cause a net loss to yourself to punish crimes when they do happen.  (My recent comments on the matter.)

 

"Mutually assured destruction" tactics: You would not be in the position of having a peaceful enemy unless you would cause destruction of both yourself and the enemy in those cases where the enemy attacks.

 

Voting: You would not be in a polity where humans (rather than "lizards") rule over you unless you would cause yourself to endure the costs of voting despite the slim chance of influencing the outcome.

 

Lying: You would not be in the position where your statements influence others’ beliefs unless you would be willing state true things that are sub-optimal to you for others to believe. (Kant/Categorical Imperative name-check)

 

Cheating on tests: You would not be in the position to reap the (larger) gains of being able to communicate your ability unless you would forgo the benefits of an artificially-high score.  (Kant/Categorical Imperative name-check)

 

Shoplifting: You would not be in the position where merchants offer goods of this quality, with this low of a markup and this level of security lenience unless you would pass up the opportunity to shoplift even when you could get away with it, or at least have incorrect beliefs about the success probability that lead you to act this way.  (Controversial -- see previous discussion.)

 

Hazing/abuse cycles: You would not be in the position to be unhazed/unabused (as often) by earlier generations unless you would forgo the satisfaction of abusing later generations when you had been abused.

 

Akrasia/addiction: You would not be addiction- and bad habit-free unless you would cause the pain of not feeding the habit during the existence-moments when you do have addictions and bad habits.

 

Absent-Minded Driver: You would not ever have the opportunity to take the correct exit unless you would sometimes drive past it.

 

Parfit's Hitchhiker: You would not be in the position of surviving the desert unless you would cause the loss of money to pay the rescuer.

 

Newcomb's problem: You would not be in the position of Box #2 being filled unless you would forgo the contents of Box #1.

 

Newcomb's problem with transparent boxes: Ditto, except that Box #2 isn't always filled.

 

Prisoner's Dilemma: You would not be in the position of having a cooperating partner unless you would cause the diminished "expected prison avoidance" by cooperating yourself.

 

Counterfactual Mugging: You would not ever be in the position of receiving lots of free money unless you would cause yourself to lose less money in those cases where you lose the coin flip.

 

[1] “Cause” is used here in the technical sense, which requires the effect to be either in the future, or, in timeless formalisms, a descendent of the minimal set (in a Bayesian network) that screens off knowledge about the effect.  In the parlance of Newcomb’s problem, it may feel intuitive to say that “one-boxing causes Box #2 to be filled”, but this is not correct in the technical sense.

Hazing as Counterfactual Mugging?

3 SilasBarta 11 October 2010 02:17PM

In the interest of making decision theory problems more relevant, I thought I'd propose a real-life version of counterfactual mugging.  This is discussed in Drescher's Good and Real, and many places before.  I will call it the Hazing Problem by comparison to this practice (possibly NSFW – this is hazing, folks, not Disneyland).

 

The problem involves a timewise sequence of agents who each decide whether to "haze" (abuse) the next agent.  (They cannot impose any penalty on previous agent.)  For all agents n, here is their preference ranking:

 

1) not be hazed by n-1

2) be hazed by n-1, and haze n+1

3) be hazed by n-1, do NOT haze n+1

 

or, less formally:

 

1) not be hazed

2) haze and be hazed

3) be hazed, but stop the practice

 

The problem is: you have been hazed by n-1.  Should you haze n+1?

 

Like in counterfactual mugging, the average agent has lower utility by conditioning on having been hazed, no matter how big the utility difference between 2) and 3) is.  Also, it involves you having to make a choice from within a "losing" part of the "branching", which has implications for the other branches.

 

You might object the choice of whether to haze is not random, as Omega’s coinflip is in CM; however, there are deterministic phrasings of CM, and your own epistemic limits blur the distinction.

 

UDT sees optimality in returning not-haze unconditionally.  CDT reasons that its having been hazed is fixed, and so hazes.  I *think* EDT would choose to haze because it would prefer to learn that, having been hazed, they hazed n+1, but I'm not sure about that.

 

I also think that TDT chooses not-haze, although this is questionable since I'm claiming this is isomorphic to CM.  I would think TDT reasons that, "If n's regarded it as optimal to not haze despite having been hazed, then I would not be in a position of having been hazed, so I zero out the disutility of choosing not-haze."

 

Thoughts on the similarity and usefulness of the comparison?