## Simplified Anthropic Doomsday

1 02 September 2017 08:37PM

Here is a simplified version of the Doomsday argument in Anthropic decision theory, to get easier intuitions.

Assume a single agent A exists, an average utilitarian, with utility linear in money. Their species survives with 50% probability; denote this event by S. If the species survives, there will be 100 people total; otherwise the average utilitarian is the only one of its kind. An independent coin lands heads with 50% probability; denote this event by H.

Agent A must price a coupon CS that pays out €1 on S, and a coupon CH that pays out €1 on H. The coupon CS pays out only on S, thus the reward only exists in a world where there are a hundred people, thus if S happens, the coupon CS is worth (€1)/100. Hence its expected worth is (€1)/200=(€2)/400.

But H is independent of S, so (H,S) and (H,¬S) both have probability 25%. In (H,S), there are a hundred people, so CH is worth (€1)/100. In (H,¬S), there is one person, so CH is worth (€1)/1=€1. Thus the expected value of CH is (€1)/4+(€1)/400 = (€101)/400. This is more than 50 times the value of CS.

Note that C¬S, the coupon that pays out on doom, has an even higher expected value of (€1)/2=(€200)/400.

So, H and S have identical probability, but A assigns CS and CH different expected utilities, with a higher value to CH, simply because S is correlated with survival and H is independent of it (and A assigns an ever higher value to C¬S, which is anti-correlated with survival). This is a phrasing of the Doomsday Argument in ADT.

## The Doomsday argument in anthropic decision theory

5 31 August 2017 01:44PM

EDIT: added a simplified version here.

Crossposted at the intelligent agents forum.

In Anthropic Decision Theory (ADT), behaviours that resemble the Self Sampling Assumption (SSA) derive from average utilitarian preferences (and from certain specific selfish preferences).

However, SSA implies the doomsday argument, and, to date, I hadn't found a good way to express the doomsday argument within ADT.

This post will remedy that hole, by showing how there is a natural doomsday-like behaviour for average utilitarian agents within ADT.

## A Comment on Expected Utility Theory

0 05 June 2017 03:26AM

## A Comment on Expected Utility Theory

Expected utility theory/expected value decision making—as the case may be—is quite interesting I guess. In times, past (for a few months at longest as I am a neophyte to rationality) I habitually trusted the answers expected utility theory provided without bothering to test them, or ponder for myself why they would be advisable. I mean, I first learned the concept of expected value in statistics and when we studied it in Operations Research—as part of an introduction to decision theory—it just seemed to make sense. However, after the recent experiment I did (the decision problem between a guaranteed \$250,000 and a 10\% chance to get \$10,000,000) I began to start doubting the appropriateness of Expected Utility Theory. Over 85\% of subjects chose the first option, despite the latter having an expected value 4 times higher than the former. I myself realised that the only scenario in which I would choose the \$10,000,000 was one in which \$250,000 was an amount I could pass up. Now I am fully cognisant of expected utility theory, and my decision to pick the first option did not seem to be prey to any bias, so a suspicion on the efficacy of expected utility theory began to develop in my mind. I took my experiment to http://www.reddit.com/r/lesswrong; a community who I expected would be more rational decision makers—they only confirmed the decision making of the first group. I realised then, that something was wrong; my map didn’t reflect the territory. If expected utility theory was truly so sound, then a community of rationalists should have adhered to its dictates. I filed this information at the back of my mind. My brain began working on it, and today while I was reading “Thinking, Fast and Slow” by Daniel Kahneman my brain delivered an answer to me.

I do not consider myself a slave to rationality; it is naught but a tool for me to achieve my goals. A tool to help me “win”, and to do so consistently. If any ritual of cognition causes me to lose, then I abandon it. There is no sentimentality on the road to victory, and above all I endeavour to be efficient—ruthlessly so if needed. As such, I am willing to abandon any theory of decision making, when I determine it would cause me to lose. Nevertheless, as a rationalist I had to wonder; if expected utility theory was so feeble a stratagem, why had it stuck around for so long? I decided to explore the theory from its roots; to derive it for myself so to speak; to figure out where the discrepancy had come from.

Expected Utility Theory, aims to maximise the Expected Utility of a decision which is naught but the average utility of that decision—the average payoff.

Average payoff is given by the formula:
$E_{j} = Pr_i*G_{ij} \tag{1}$
Where
$$E_j$$ = Expected value of Decision $$j$$
$$P_j$$ = Probability of Scenario $$i$$
$$G_{ij}$$ = Payoff of Decision $$j$$ under Scenario $$i$$.

What caught my interest when I decided to investigate expected utility theory from its roots, was the use of probability in the formula.

Now the definition of probability is:
$Pr(i) = \lim_{n \to \infty} \frac{\sum i}{n} \tag{2}$
Where $$\sum i$$ is to be understood to be $$f i$$ the frequency of $$i$$.
If I keep in mind the definition of probability, I find something interesting; Expected Utility Theory maximises my payoff in the long run. For decision problems, which are iterated—in which I play the game several times—then Expected Utility Theory is my best bet. The closer the number of iterations are to infinity, the closer the probability is to the ratio above.

Substituting $$(2)$$ into $$(1)$$ we get:
$E_j = \frac{\sum i}{n} * G_{ij} \tag{(3)}$

What Expected Utility theory tells us is to choose the highest $$E_j$$; this is only guaranteed to be the optimum decision in a scenario where $$(1)$$ = $$(3)$$ I.e.

1. The decision problem has a (sufficiently) large number of iterations.
2. The decision problem involves a (sufficiently) large number of scenarios.
What exactly constitutes “large” is left to the reader’s discretion. However, $$2$$ is definitely not large. To a rely on expected utility theory in a non—iterated game with only two scenarios can easily lead to fatuous decision making. In such problems like the one I posited in my “experiment” a sensible decision-making procedure is the maximum likelihood method; pick the decision that gives the highest payoff in the most likely scenario. However, even that heuristic may not be advisable; what if Scenario $$i$$ as a probability of $$0.5 - \epsilon$$ and the second scenario $$j$$ has a probability of $$0.5 + \epsilon$$? Merely relying on the maximum likelihood heuristic is unwise. $$epsilon$$ here stands for a small number—the definition of small is left to the user’s discretion.

After much deliberation, I reached a conclusion; in any non—iterated game in which a single scenario has an overwhelming high probability $$Pr = 1 - \epsilon$$, then the maximum likelihood approach is the rational decision-making approach. Personally, I believe $$\epsilon$$ should be $$\ge 0.005$$ and set mine at around $$0.1$$.

I may in future revisit this writeup, and add a mathematical argument for the application of the Maximum likelihood approach over the Expected Utility approach but for now, I shall posit a simpler argument:

The Expected Utility approach is sensible only in that it maximises winnings in the long run—by its very design, it is intended for games that are iterated and/or in which there is a large number of scenarios. In games where this is not true—with few scenarios and a single instance—there is sufficient variation in the event that occurs that there is a significant deviation of the actual payoff from the expected payoff. To ignore this deviation is oversimplification, and—I’ll argue—irrational. In the experiment I listed above, the actual payoff for the second decision was \$0 or \$10,000,000; the former scenario having a likelihood of 90\% and the latter a 10\%. The expected value is \$1,000,000 but the standard deviation of the payoffs from the expected value—in this case \$3,000,000—is 300\% the mean. In such cases, I conclude that the expected utility approach is simply unreliable—and expectably so—it was never designed for such problems in the first place (pun intended).

## [Link] Decision Theories in Real Life

2 13 May 2017 01:47AM

7 09 April 2017 03:42AM

## Making equilibrium CDT into FDT in one+ easy step

6 21 March 2017 02:42PM

In this post, I'll argue that Joyce's equilibrium CDT (eCDT) can be made into FDT (functional decision theory) with the addition of an intermediate step - a step that should have no causal consequences. This would show that eCDT is unstable under causally irrelevant changes, and is in fact a partial version of FDT.

Joyce's principle is:

Full Information. You should act on your time-t utility assessments only if those assessments are based on beliefs that incorporate all the evidence that is both freely available to you at t and relevant to the question about what your acts are likely to cause.

When confronted by a problem with a predictor (such as Death in Damascus or the Newcomb problem), this allows eCDT to recursively update their probabilities of the behaviour of the predictor, based on their own estimates of their own actions, until this process reaches equilibrium. This allows it to behave like FDT/UDT/TDT on some (but not all) problems. I'll argue that you can modify the setup to make eCDT into a full FDT.

## Death in Damascus

In this problem, Death has predicted whether the agent will stay in Damascus (S) tomorrow, or flee to Aleppo (F). And Death has promised to be in the same city as the agent (D or A), to kill them. Having made its prediction, Death then travels to that city to wait for the agent. Death is known to be a perfect predictor, and the agent values survival at $1000, while fleeing costs$1.

Then eCDT fleeing to Aleppo with probability 999/2000. To check this, let x be the probability of fleeing to Aleppo (F), and y the probability of Death being there (A). The expected utility is then

• 1000(x(1-y)+(1-x)y)-x                                                    (1)

Differentiating this with respect to x gives 999-2000y, which is zero for y=999/2000. Since Death is a perfect predictor, y=x and eCDT's expected utility is 499.5.

The true expected utility, however, is -999/2000, since Death will get the agent anyway, and the only cost is the trip to Aleppo.

## Delegating randomness

The eCDT decision process seems rather peculiar. It seems to allow updating of the value of y dependent on the value of x - hence allow acausal factors to be considered - but only in a narrow way. Specifically, it requires that the probability of F and A be equal, but that those two events remain independent. And it then differentiates utility according to the probability of F only, leaving that of A fixed. So, in a sense, x correlates with y, but small changes in x don't correlate with small changes in y.

That's somewhat unsatisfactory, so consider the problem now with an extra step. The eCDT agent no longer considers whether to stay or flee; instead, it outputs X, a value between 0 and 1. There is a uniform random process Z, also valued between 0 and 1. If Z<X, then the agent flees to Aleppo; if not, it stays in Damascus.

This seems identical to the original setup, for the agent. Instead of outputting a decision as to whether to flee or stay, it outputs the probability of fleeing. This has moved the randomness in the agent's decision from inside the agent to outside it, but this shouldn't make any causal difference, because the agent knows the distribution of Z.

Death remains a perfect predictor, which means that it can predict X and Z, and will move to Aleppo if and only if Z<X.

Now let the eCDT agent consider outputting X=x for some x. In that case, it updates its opinion of Death's behaviour, expecting that Death will be in Aleppo if and only if Z<x. Then it can calculate the expected utility of setting X=x, which is simply 0 (Death will always find the agent) minus x (the expected cost of fleeing to Aleppo), hence -x. Among the "pure" strategies, X=0 is clearly the best.

Now let's consider mixed strategies, where the eCDT agent can consider a distribution PX over values of X (this is a sort of second order randomness, since X and Z already give randomness over the decision to move to Aleppo). If we wanted the agent to remain consistent with the previous version, the agent then models Death as sampling from PX, independently of the agent. The probability of fleeing is just the expectation of PX; but the higher the variance of PX, the harder it is for Death to predict where the agent will go. The best option is as before: PX will set X=0 with probability 1001/2000, and X=1 with probability 999/2000.

But is this a fair way of estimating mixed strategies?

## Average Death in Aleppo

Consider a weaker form of Death, Average Death. Average Death cannot predict X, but can predict PX, and will use that to determine its location, sampling independently from it. Then, from eCDT's perspective, the mixed-strategy behaviour described above is the correct way of dealing with Average Death.

But that means that the agent above is incapable of distinguishing between Death and Average Death. Joyce argues strongly for considering all the relevant information, and the distinction between Death and Average Death is relevant. Thus it seems when considering mixed strategies, the eCDT agent must instead look at the pure strategies, compute their value (-x in this case) and then look at the distribution over them.

One might object that this is no longer causal, but the whole equilibrium approach undermines the strictly causal aspect anyway. It feels daft to be allowed to update on Average Death predicting PX, but not on Death predicting X. Especially since moving from PX to X is simply some random process Z' that samples from the distribution PX. So Death is allowed to predict PX (which depends on the agent's reasoning) but not Z'. It's worse than that, in fact: Death can predict PX and Z', and the agent can know this, but the agent isn't allowed to make use of this knowledge.

Given all that, it seems that in this situation, the eCDT agent must be able to compute the mixed strategies correctly and realise (like FDT) that staying in Damascus (X=0 with certainty) is the right decision.

## Let's recurse again, like we did last summer

This deals with Death, but not with Average Death. Ironically, the "X=0 with probability 1001/2000..." solution is not the correct solution for Average Death. To get that, we need to take equation (1), set x=y first, and then differentiate with respect to x. This gives x=1999/4000, so setting "X=0 with probability 2001/4000 and X=1 with probability 1999/4000" is actually the FDT solution for Average Death.

And we can make the eCDT agent reach that. Simply recurse to the next level, and have the agent choose PX directly, via a distribution PPX over possible PX.

But these towers of recursion are clunky and unnecessary. It's simpler to state that eCDT is unstable under recursion, and that it's a partial version of FDT.

## [Stub] Newcomb problem as a prisoners' dilemma/anti-coordination game

2 21 March 2017 10:34AM

You should always cooperate with an identical copy of yourself in the prisoner's dilemma. This is obvious, because you and the copy will reach the same decision.

That justification implicitly assumes that you and your copy as (somewhat) antagonistic: that you have opposite aims. But the conclusion doesn't require that at all. Suppose that you and your copy were instead trying to ensure that one of you got maximal reward (it doesn't matter which). Then you should still jointly cooperate because (C,C) is possible, while (C,D) and (D,C) are not (I'm ignoring randomising strategies for the moment).

Now look at the Newcomb problem. You decision enters twice: once when you decide how many boxes to take, and once when Omega is simulating or estimating you to decide how much money to put in box B. You would dearly like your two "copies" (one of which may just be an estimate) to be out of sync - for the estimate to 1-box while the real you two-boxes. But without any way of distinguishing between the two, you're stuck with taking the same action - (1-box,1-box). Or, seeing it another way, (C,C).

This also makes the Newcomb problem into an anti-coordination game, where you and your copy/estimate try to pick different options. But, since this is not possible, you have to stick to the diagonal. This is why the Newcomb problem can be seen both as an anti-coordination game and a prisoners' dilemma - the differences only occur in the off-diagonal terms that can't be reached.

## [Error]: Statistical Death in Damascus

3 20 March 2017 07:17PM

Note: This post is in error, I've put up a corrected version of it here. I'm leaving the text in place, as historical record. The source of the error is that I set Pa(S)=Pe(D) and then differentiated with respect to Pa(S), while I should have differentiated first and then set the two values to be the same.

Nate Soares and Ben Levinstein have a new paper out on "Functional Decision theory", the most recent development of UDT and TDT.

This post is about further analysing the "Death in Damascus" problem, and to show that Joyce's "equilibrium" version of CDT (causal decision theory) is in a certain sense intermediate between CDT and FDT. If eCDT is this equilibrium theory, then it can deal with a certain class of predictors, which I'll call distribution predictors.

## Death in Damascus

In the original Death in Damascus problem, Death is a perfect predictor. It finds you in Damascus, and says that it's already planned it's trip for tomorrow - and it'll be in the same place you will be.

You value surviving at $1000, and can flee to Aleppo for$1.

Classical CDT will put some prior P over Death being in Damascus (D) or Aleppo (A) tomorrow. And then, if P(A)>999/2000, you should stay (S) in Damascus, while if P(A)<999/2000, you should flee (F) to Aleppo.

FDT estimates that Death will be wherever you will, and thus there's no point in F, as that will just cost you $1 for no reason. But it's interesting what eCDT produces. This decision theory requires that Pe (the equilibrium probability of A and D) be consistent with the action distribution that eCDT computes. Let Pa(S) be the action probability of S. Since Death knows what you will do, Pa(S)=Pe(D). The expected utility is 1000.Pa(S)Pe(A)+1000.Pa(F)Pe(D)-Pa(F). At equilibrium, this is 2000.Pe(A)(1-Pe(A))-Pe(A). And that quantity is maximised when Pe(A)=1999/4000 (and thus the probability of you fleeing is also 1999/4000). This is still the wrong decision, as paying the extra$1 is pointless, even if it's not a certainty to do so.

So far, nothing interesting: both CDT and eCDT fail. But consider the next example, on which eCDT does not fail.

## Statistical Death in Damascus

Let's assume now that Death has an assistant, Statistical Death, that is not a prefect predictor, but is a perfect distribution predictor. It can predict the distribution of your actions, but not your actual decision. Essentially, you have access to a source of true randomness that it cannot predict.

It informs you that its probability over whether to be in Damascus or Aleppo will follow exactly the same distribution as yours.

Classical CDT follows the same reasoning as before. As does eCDT, since Pa(S)=Pe(D), as before, since Statistical Death follows the same distribution as you do.

But what about FDT? Well, note that FDT will reach the same conclusion as eCDT. This is because 1000.Pa(S)Pe(A)+1000.Pa(F)Pe(D)-Pa(F) is the correct expected utility, the Pa(S)=Pe(D) assumption is correct for Statistical Death, and (S,F) is independent of (A,D) once the action probabilities have been fixed.

So on the Statistical Death problem, eCDT and FDT say the same thing.

## Factored joint distribution versus full joint distributions

What's happening is that there is a joint distribution over (S,F) (your actions) and (D,A) (Death's actions). FDT is capable of reasoning over all types of joint distributions, and fully assessing how its choice of Pa acausally affects Death's choice of Pe.

But eCDT is only capable of reasoning over ones where the joint distribution factors into a distribution over (S,F) times a distribution over (D,A). Within the confines of that limitation, it is capable of (acausally) changing Pe via its choice of Pa.

Death in Damascus does not factor into two distributions, so eCDT fails on it. Statistical Death in Damascus does so factor, so eCDT succeeds on it. Thus eCDT seems to be best conceived of as a version of FDT that is strangely limited in terms of which joint distributions its allowed to consider.

## [Link] Putanumonit: A spreadsheet helps debias decisions, like picking a girl to date

10 15 March 2017 03:19AM

1 04 March 2017 04:58PM

## [Link] Alien Implant: Newcomb's Smoking Lesion

2 03 March 2017 04:51AM

0 15 February 2017 06:51PM

6 07 February 2017 06:42PM

## Recent updates to gwern.net (2015-2016)

28 26 August 2016 07:22PM

Previously: 2011; 2012-2013; 2013-2014; 2014-2015

"When I was one-and-twenty / I heard a wise man say, / 'Give crowns and pounds and guineas / But not your heart away; / Give pearls away and rubies / But keep your fancy free.' / But I was one-and-twenty, / No use to talk to me."

My past year of completed writings, sorted by topic:

Genetics:

• Embryo selection for intelligence cost-benefit analysis
• meta-analysis of intelligence GCTAs, limits set by measurement error, current polygenic scores, possible gains with current IVF procedures, the benefits of selection on multiple complex traits, the possible annual value in the USA of selection & value of larger GWASes, societal consequences of various embryo selection scenarios, embryo count versus polygenic scores as limiting factors, comparison with iterated embryo selection, limits to total gains from iterated embryo selection etc.
• Wikipedia article on Genome-wide complex trait analysis (GCTA)

AI:

Biology:

Statistics:

Cryptography:

Misc:

gwern.net itself has remained largely stable (some CSS fixes and image size changes); I continue to use Patreon and send out my newsletters.

## In partially observable environments, stochastic policies can be optimal

5 19 July 2016 10:42AM

I always had the informal impression that the optimal policies were deterministic (choosing the best option, rather than some mix of options). Of course, this is not the case when facing other agents, but I had the impression this would hold when facing the environment rather that other players.

But stochastic policies can also be needed if the environment is partially observable, at least if the policy is Markov (memoryless). Consider the following POMDP (partially observable Markov decision process):

There are two states, 1a and 1b, and the agent cannot tell which one they're in. Action A in state 1a and B in state 1b, gives a reward of -R and keeps the agent in the same place. Action B in state 1a and A in state 1b, gives a reward of R and moves the agent to the other state.

The returns for the two deterministic policies - A and B - are -R every turn except maybe for the first. While the return for the stochastic policy of 0.5A + 0.5B is 0 per turn.

Of course, if the agent can observe the reward, the environment is no longer partially observable (though we can imagine the reward is delayed until later). And the general policy of "alternate A and B" is more effective that the 0.5A + 0.5B policy. Still, that stochastic policy is the best of the memoryless policies available in this POMDP.

## How did my baby die and what is the probability that my next one will?

22 19 January 2016 06:24AM

Summary: My son was stillborn and I don't know why. My wife and I would like to have another child, but would very much not like to try if the probability of this occurring again is above a certain threshold (of which we have already settled on one). All 3 doctors I have consulted were unable to give a definitive cause of death, nor were any willing to give a numerical estimate of the probability (whether for reasons of legal risk, or something else) that our next baby will be stillborn. I am likely too mind-killed to properly evaluate my situation and would very much appreciate an independent (from mine) probability estimate of what caused my son to die, and given that cause, what is the recurrence risk?

Background: V (L and my only biologically related living son) had no complications during birth, nor has he showed any signs of poor health whatsoever. L has a cousin who has had two miscarriages, and I have an aunt who had several stillbirths followed by 3 live births of healthy children. We know of no other family members that have had similar misfortunes.

J (my deceased son) was the product of a 31 week gestation. L (my wife and J's mother) is 28 years old, gravida 2, para 1. L presented to the physicians office for routine prenatal care and noted that she had not felt any fetal movement for the last five to six days. No fetal heart tones were identified. It was determined that there was an intrauterine fetal demise. L was admitted on 11/05/2015 for induction and was delivered of a nonviable, normal appearing, male fetus at approximately 1:30 on 11/06/2015.

Pro-Con Reasoning: According to a leading obstetrics textbook1, causes of stillbirth are commonly classified into 8 categories: obstetrical complications, placental abnormalities, fetal malformations, infection, umbilical cord abnormalities, hypertensive disorders, medical complications, and undetermined. Below, I'll list the percentage of stillbirths in each category (which may be used as prior probabilities) along with some reasons for or against.

Obstetrical complications (29%)

• Against: No abruption detected. No multifetal gestation. No ruptured preterm membranes at 20-24 weeks.

Placental abnormalities (24%)

• For: Excessive fibrin deposition (as concluded in the surgical pathology report). Early acute chorioamnionitis (as conclused in the surgical pathology report, but Dr. M claimed this was caused by the baby's death, not conversely). L has gene variants associated with deep vein thrombosis (AG on rs2227589 per 23andme raw data).
• Against: No factor V Leiden mutation (GG on rs6025 per 23andme raw data and confirmed via independent lab test). No prothrombin gene mutation (GG on l3002432 per 23andme raw data and confirmed via independent lab test). L was negative for prothrombin G20210A mutation (as determined by lab test). Anti-thrombin III activity results were within normal reference ranges (as determined by lab test). Protein C activity results were withing normal reference ranges (as determined by lab test). Protein S activity results were within normal reference ranges (as determined by lab test). Protein S antigen (free and total) results were within normal references ranges (as determined by lab test).

Infection (13%)

• For: L visited a nurse's home during the last week of August that works in a hospital we now know had frequent cases of CMV infection. CMV antibody IgH, CMV IgG, and Parvovirus B-19 Antibody IgG values were outside of normal reference ranges.
• Against: Dr. M discounted the viral test results as the cause of death, since the levels suggested the infection had occurred years ago, and therefore could not have caused J's death. Dr. F confirmed Dr. M's assessment.

Fetal malformations (14%)

• Against: No major structural abnormalities. No genetic abnormalities detected (CombiSNP Array for Pregnancy Loss results showed a normal male micro array profile).

Umbilical cord abnormalities (10%)

• Against: No prolapse. No stricture. No thrombosis.

Hypertensive disorder (9%)

• Against: No preeclampsia. No chronic hypertension.

Medical complications (8%)

• For: L experienced 2 nights of very painful abdominal pains that could have been contractions on 10/28 and 10/29. L remembers waking up on her back a few nights between 10/20 and 11/05 (it is unclear if this belongs in this category or somewhere else).
• Against: No antiphospholipid antibody syndrome detected (determined via Beta-2 Glycoprotein I Antibodies [IgG, IgA, IgM] test). No maternal diabetes detected (determined via glucose test on 10/20).

Undetermined (24%)

What is the most likely cause of death? How likely is that cause? Given that cause, if we choose to have another child, then how likely is it to survive its birth? Are there any other ways I could reduce uncertainty (additional tests, etc...) that I haven't listed here? Are there any other forums where these questions are more likely to get good answers? Why won't doctors give probabilities? Help with any of these questions would be greatly appreciated. Thank you.

If your advice to me is to consult another expert (in addition to the 2 obstetricians and 1 high-risk obstetrician I already have consulted), please also provide concrete tactics as to how to find such an expert and validate their expertise.

Contact Information: If you would like to contact me, but don't want to create an account here, you can do so at deprimita.patro@gmail.com.

[1] Cunningham, F. (2014). Williams obstetrics. New York: McGraw-Hill Medical.

EDIT 1: Updated to make clear that both V and J are mine and L's biological sons.

EDIT 2: Updated to add information on family history.

EDIT 3: On

## Forecasting and recursive Inhibition within a decision cycle

1 [deleted] 20 December 2015 05:37AM

When we anticipate the future, we the opportunity to inhibit our behaviours which we anticipate will lead to counterfactual outcomes. Those of us with sufficiently low latencies in our decision cycles may recursively anticipate the consequences of counterfactuating (neologism) interventions to recursively intervene against our interventions.

This may be difficult for some. Try modelling that decision cycle as a nano-scale approximation of time travel. One relevant paradox from popular culture is the farther future paradox described in the tv cartoon called Family Guy.

Relating the satire back to our abstraction of the decision cycle, one may ponder:

What is a satisfactory stopping rule for the far anticipation of self-referential consequence?

That is:

(1) what are the inherent harmful implications of inhibiting actions in and of themselves: stress?

(2) what are their inherent merits: self-determination?

and (3) what are the favourable and disfavourable consequences as x point into the future given y number of points of self reference at points z, a, b and c?

see no ready solution to this problem in terms of human rationality, and see no corresponding problem in artificial intelligence, where it would also apply. Given the relevance to MIRI (since CFAR doesn't seem work on open-problems in the same way)

I would like to also take this opportunity to open this as an experimental thread for the community to generate a list of ''open-problems'' in human rationality that are otherwise scattered across the community blog and wiki.

## Omega's Idiot Brother, Epsilon

3 25 November 2015 07:57PM

Epsilon walks up to you with two boxes, A and b, labeled in rather childish-looking handwriting written in crayon.

"In box A," he intones, sounding like he's trying to be foreboding, which might work better when he hits puberty, "I may or may not have placed a million of your human dollars."  He pauses for a moment, then nods.  "Yes.  I may or may not have placed a million dollars in this box.  If I expect you to open Box B, the million dollars won't be there.  Box B will contain, regardless of what you do, one thousand dollars.  You may choose to take one box, or both; I will leave with any boxes you do not take."

You've been anticipating this.  He's appeared to around twelve thousand people so far.  Out of eight thousand people who accepted both boxes, eighty found the million dollars missing, and walked away with $1,000; the other seven thousand nine hundred and twenty people walked away with$1,001,000 dollars.  Out of the four thousand people who opened only box A, only four found it empty.

The agreement is unanimous: Epsilon is really quite bad at this.  So, do you one-box, or two-box?

There are some important differences here with the original problem.  First, Epsilon won't let you open either box until you've decided whether to open one or both, and will leave with the other box.  Second, while Epsilon's false positive rate on identifying two-boxers is quite impressive, making mistakes about one-boxers only .1% of the time, his false negative rate is quite unimpressive - he catches 1% of everybody who engages in it.  Whatever heuristic he's using, clearly, he prefers to let two-boxers slide than to accidentally punish one-boxers.

I'm curious to know whether anybody would two-box in this scenario and why, and particularly curious in the reasoning of anybody whose answer is different between the original Newcomb problem and this one.

## Newcomb, Bostrom, Calvin: Credence and the strange path to a finite afterlife

7 02 November 2015 11:03PM

This is a bit rough, but I think that it is an interesting and potentially compelling idea. To keep this short, and accordingly increase the number of eyes over it, I have only sketched the bare bones of the idea.

1)      Empirically, people have varying intuitions and beliefs about causality, particularly in Newcomb-like problems (http://wiki.lesswrong.com/wiki/Newcomb's_problemhttp://philpapers.org/surveys/results.pl, and https://en.wikipedia.org/wiki/Irresistible_grace).

2)      Also, as an empirical matter, some people believe in taking actions after the fact, such as one-boxing, or Calvinist “irresistible grace”, to try to ensure or conform with a seemingly already determined outcome. This might be out of a sense of retrocausality, performance, moral honesty, etc. What matters is that we know that they will act it out, despite it violating common sense causality. There has been some great work on decision theory on LW about trying to thread this needle well.

3)      The second disjunct of the simulation argument (http://wiki.lesswrong.com/wiki/Simulation_argument) shows that the decision making of humanity is evidentially relevant in what our subjective credence should be that we are in a simulation. That is to say, if we are actively headed toward making simulations, we should increase our credence of being in a simulation, if we are actively headed away from making simulations, through either existential risk or law/policy against it, we should decrease our credence.

4)      Many, if not most, people would like for there to be a pleasant afterlife after death, especially if we could be reunited with loved ones.

5)      There is no reason to believe that simulations which are otherwise nearly identical copies of our world, could not contain, after the simulated bodily death of the participants, an extremely long-duration, though finite, "heaven"-like afterlife shared by simulation participants.

6)      Our heading towards creating such simulations, especially if they were capable of nesting simulations, should increase credence that we exist in such a simulation and should perhaps expect a heaven-like afterlife of long, though finite, duration.

7)      Those who believe in alternative causality, or retrocausality, in Newcomb-like situations should be especially excited about the opportunity to push the world towards surviving, allowing these types of simulations, and creating them, as it would potentially suggest, analogously, that if they work towards creating simulations with heaven-like afterlives, that they might in some sense be “causing” such a heaven to exist for themselves, and even for friends and family who have already died. Such an idea of life-after-death, and especially for being reunited with loved ones, can be extremely compelling.

8)      I believe that people matching the above description, that is, holding both an intuition in alternative causality, and finding such a heaven-like-afterlife compelling, exist. Further, the existence of such people, and their associated motivation to try to create such simulations, should increase the credence even of two-boxing types, that we already live in such a world with a heaven-like afterlife. This is because knowledge of a motivated minority desiring simulations should increase credence in the likely success of simulations. This is essentially showing that “this probably happened before, one level up” from the two-box perspective.

9)      As an empirical matter, I also think that there are people who would find the idea of creating simulations with heaven-like afterlives compelling, even if they are not one-boxers, from a simply altruistic perspective, both since it is a nice thing to do for the future sim people, who can, for example, probabilistically have a much better existence than biological children on earth can, and as it is a nice thing to do to increase the credence (and emotional comfort) of both one-boxers and two-boxers in our world thinking that there might be a life after death.

10)   This creates the opportunity for a secular movement in which people work towards creating these simulations, and use this work and potential success in order to derive comfort and meaning from their life. For example, making donations to a simulation-creating or promoting, or existential threat avoiding, think-tank after a loved one’s death, partially symbolically, partially hopefully.

11)   There is at least some room for Pascalian considerations even for two-boxers who allow for some humility in their beliefs. Nozick believed one-boxers will become two boxers if Box A is raised to 900,000, and two-boxers will become one-boxers if Box A is lowered to $1. Similarly, trying to work towards these simulations, even if you do not find it altruistically compelling, and even if you think that the odds of alternative or retrocausality is infinitesimally small, might make sense in that the reward could be extremely large, including potentially trillions of lifetimes worth of time spent in an afterlife “heaven” with friends and family. Finally, this idea might be one worth filling in (I have been, in my private notes for over a year, but am a bit shy to debut that all just yet, even working up the courage to post this was difficult) if only because it is interesting, and could be used as a hook to get more people interested in existential risk, including the AI control problem. This is because existential catastrophe is probably the best enemy of credence in the future of such simulations, and accordingly in our reasonable credence in thinking that we have such a heaven awaiting us after death now. A short hook headline like “avoiding existential risk is key to afterlife” can get a conversation going. I can imagine Salon, etc. taking another swipe at it, and in doing so, creating publicity which would help in finding more similar minded folks to get involved in the work of MIRI, FHI, CEA etc. There are also some really interesting ideas about acausal trade, and game theory between higher and lower worlds, as a form of “compulsion” in which they punish worlds for not creating heaven containing simulations (therefore effecting their credence as observers of the simulation), in order to reach an equilibrium in which simulations with heaven-like afterlives are universal, or nearly universal. More on that later if this is received well. Also, if anyone would like to join with me in researching, bull sessioning, or writing about this stuff, please feel free to IM me. Also, if anyone has a really good, non-obvious pin with which to pop my balloon, preferably in a gentle way, it would be really appreciated. I am spending a lot of energy and time on this if it is fundamentally flawed in some way. Thank you. ******************************* November 11 Updates and Edits for Clarification 1) There seems to be confusion about what I mean by self-location and credence. A good way to think of this is the Sleeping Beauty Problem (https://wiki.lesswrong.com/wiki/Sleeping_Beauty_problem) If I imagine myself as Sleeping Beauty (and who doesn’t?), and I am asked on Sunday what my credence is that the coin will be tails, I will say 1/2. If I am awakened during the experiment without being told which day it is and am asked what my credence is that the coin was tails, I will say 2/3. If I am then told it is Monday, I will update my credence to ½. If I am told it is Tuesday I update my credence to 1. If someone asks me two days after the experiment about my credence of it being tails, if I somehow do not know the days of the week still, I will say ½. Credence changes with where you are, and with what information you have. As we might be in a simulation, we are somewhere in the “experiment days” and information can help orient our credence. As humanity potentially has some say in whether or not we are in a simulation, information about how humans make decisions about these types of things can and should effect our credence. Imagine Sleeping Beauty is a lesswrong reader. If Sleeping Beauty is unfamiliar with the simulation argument, and someone asks her about her credence of being in a simulation, she probably answers something like 0.0000000001% (all numbers for illustrative purposes only). If someone shows her the simulation argument, she increases to 1%. If she stumbles across this blog entry, she increases her credence to 2%, and adds some credence to the additional hypothesis that it may be a simulation with an afterlife. If she sees that a ton of people get really interested in this idea, and start raising funds to build simulations in the future and to lobby governments both for great AI safeguards and for regulation of future simulations, she raises her credence to 4%. If she lives through the AI superintelligence explosion and simulations are being built, but not yet turned on, her credence increases to 20%. If humanity turns them on, it increases to 50%. If there are trillions of them, she increases her credence to 60%. If 99% of simulations survive their own run-ins with artificial superintelligence and produce their own simulations, she increases her credence to 95%. 2) This set of simulations does not need to recreate the current world or any specific people in it. That is a different idea that is not necessary to this argument. As written the argument is premised on the idea of creating fully unique people. The point would be to increase our credence that we are functionally identical in type to the unique individuals in the simulation. This is done by creating ignorance or uncertainty in simulations, so that the majority of people similarly situated, in a world which may or may not be in a simulation, are in fact in a simulation. This should, in our ignorance, increase our credence that we are in a simulation. The point is about how we self-locate, as discussed in the original article by Bostrom. It is a short 12-page read, and if you have not read it yet, I would encourage it: http://simulation-argument.com/simulation.html. The point about past loved ones I was making was to bring up the possibility that the simulations could be designed to transfer people to a separate after-life simulation where they could be reunited after dying in the first part of the simulation. This was not about trying to create something for us to upload ourselves into, along with attempted replicas of dead loved ones. This staying-in-one simulation through two phases, a short life, and relatively long afterlife, also has the advantage of circumventing the teletransportation paradox as “all of the person" can be moved into the afterlife part of the simulation. ## Min/max goal factoring and belief mapping exercise -1 [deleted] 23 June 2015 05:30AM Edit 3: Removed description of previous edits and added the following: This thread used to contain the description of a rationality exercise. I have removed it and plan to rewrite it better. I will repost it here, or delete this thread and repost in the discussion. Thank you. ## Why isn't the following decision theory optimal? 5 16 April 2015 01:38AM I've recently read the decision theory FAQ, as well as Eliezer's TDT paper. When reading the TDT paper, a simple decision procedure occurred to me which as far as I can tell gets the correct answer to every tricky decision problem I've seen. As discussed in the FAQ above, evidential decision theory get's the chewing gum problem wrong, causal decision theory gets Newcomb's problem wrong, and TDT gets counterfactual mugging wrong. In the TDT paper, Eliezer postulates an agent named Gloria (page 29), who is defined as an agent who maximizes decision-determined problems. He describes how a CDT-agent named Reena would want to transform herself into Gloria. Eliezer writes By Gloria’s nature, she always already has the decision-type causal agents wish they had, without need of precommitment. Eliezer then later goes on the develop TDT, which is supposed to construct Gloria as a byproduct. Gloria, as we have defined her, is defined only over completely decision-determined problems of which she has full knowledge. However, the agenda of this manuscript is to introduce a formal, general decision theory which reduces to Gloria as a special case. Why can't we instead construct Gloria directly, using the idea of the thing that CDT agents wished they were? Obviously we can't just postulate a decision algorithm that we don't know how to execute, and then note that a CDT agent would wish they had that decision algorithm, and pretend we had solved the problem. We need to be able to describe the ideal decision algorithm to a level of detail that we could theoretically program into an AI. Consider this decision algorithm, which I'll temporarily call Nameless Decision Theory (NDT) until I get feedback about whether it deserves a name: you should always make the decision that a CDT-agent would have wished he had pre-committed to, if he had previously known he'd be in his current situation and had the opportunity to precommit to a decision. In effect, you are making an general precommittment to behave as if you made all specific precommitments that would ever be advantageous to you. NDT is so simple, and Eliezer comes so close to stating it in his discussion of Gloria, that I assume there is some flaw with it that I'm not seeing. Perhaps NDT does not count as a "real"/"well defined" decision procedure, or can't be formalized for some reason? Even so, it does seem like it'd be possible to program an AI to behave in this way. Can someone give an example of a decision problem for which this decision procedure fails? Or for which there are multiple possible precommitments that you would have wished you'd made and it's not clear which one is best? EDIT: I now think this definition of NDT better captures what I was trying to express: You should always make the decision that a CDT-agent would have wished he had precommitted to, if he had previously considered the possibility of his current situation and had the opportunity to costlessly precommit to a decision. ## Linked decisions an a "nice" solution for the Fermi paradox 2 07 December 2014 02:58PM One of the more speculative solutions of the Fermi paradox is that all civilizations decide to stay home, thereby meta-cause other civilizations to stay home too, and thus allow the Fermi paradox to have a nice solution. (I remember reading this idea in Paul Almond’s writings about evidential decision theory, which unfortunately seem no longer available online.) The plausibility of this argument is definitely questionable. It requires a very high degree of goal convergence both within and among different civilizations. Let us grant this convergence and assume that, indeed, most civilizations arrive at the same decision and that they make their decision knowing this. One paradoxical implication then is: If a civilization decides to attempt space colonization, they are virtually guaranteed to face unexpected difficulties (for otherwise space would already be colonized, unless they are the first civilization in their neighborhood attempting space colonization). If, on the other hand, everyone decides to stay home, there is no reason for thinking that there would be any unexpected difficulties if one tried. Space colonization can either be easy, or you can try it, but not both. Can the basic idea behind the argument be formalized? Consider the following game: There are N>>1 players. Each player is offered to push a button in turn. Pushing the button yields a reward R>0 with probability p and a punishment P<0 otherwise. (R corresponds to successful space colonization while P corresponds to a failed colonization attempt.) Not pushing the button gives zero utility. If a player pushes the button and receives R, the game is immediately aborted, while the game continues if a player receives P. Players do not know how many other players were offered to push the button before them, they only know that no player before them received R. Players also don’t know p. Instead, they have a probability distribution u(p) over possible values of p. (u(p)>=0 and the integral of u(p) from 0 to 1 is given by int_{0}^{1}u(p)dp=1.) We also assume that the decisions of the different players are perfectly linked. Naively, it seems that players simply have an effective success probability p_eff,1=int_{0}^{1}p*u(p)dp and they should push the button iff p_eff,1*R+(1-p_eff,1)*P>0. Indeed, if players decide not to push the button they should expect that pushing the button would have given them R with probability p_eff,1. The situation becomes more complicated if a player decides to push the button. If a player pushes the button, they know that all players before them have also pushed the button and have received P. Before taking this knowledge into account, players are completely ignorant about the number i of players who were offered to push the button before them, and have to assign each number i from 0 to N-1 the same probability 1/N. Taking into account that all players before them have received P, the variables i and p become correlated: the larger i, the higher the probability of a small value of p. Formally, the joint probability distribution w(i,p) for the two variables is, according to Bayes’ theorem, given by w(i,p)=c*u(p)*(1-p)^i, where c is a normalization constant. The marginal distribution w(p) is given by w(p)=sum_{i=0}^{N-1}w(i,p). Using N>>1, we find w(p)=c*u(p)/p. The normalization constant is thus c=[int_{0}^{1}u(p)/p*dp]^{-1}. Finally, we find that the effective success probability taking the linkage of decisions into account is given by p_eff,2 = int_{0}^{1}p*w(p)dp = c = [int_{0}^{1}u(p)/p*dp]^{-1} . This is the expected chance of success if players decide to push the button. Players should push the button iff p_eff,2*R+(1-p_eff,2)*P>0. If follows from convexity of the function x->1/x (for positive x) that p_eff,2<=p_eff,1. So by deciding to push the button, players decrease their expected success probability from p_eff,1 to p_eff,2; they cannot both push the button and have the unaltered success probability p_eff,1. Linked decisions can explain why no one pushes the button if p_eff,2*R+(1-p_eff,2)*P<0, even though we might have p_eff,1*R+(1-p_eff,1)*P>0 and pushing the button naively seems to have positive expected utility. It is also worth noting that if u(0)>0, the integral int_{0}^{1}u(p)/p*dp diverges such that we have p_eff,2=0. This means that given perfectly linked decisions and a sufficiently large number of players N>>1, players should never push the button if their distribution u(p) satisfies u(0)>0, irrespective of the ratio of R and P. This is due to an observer selection effect: If a player decides to push the button, then the fact that they are even offered to push the button is most likely due to p being very small and thus a lot of players being offered to push the button. ## Blackmail, continued: communal blackmail, uncoordinated responses 11 22 October 2014 05:53PM The heuristic that one should always resist blackmail seems a good one (no matter how tricky blackmail is to define). And one should be public about this, too; then, one is very unlikely to be blackmailed. Even if one speaks like an emperor. But there's a subtlety: what if the blackmail is being used against a whole group, not just against one person? The US justice system is often seen to function like this: prosecutors pile on ridiculous numbers charges, threatening uncounted millennia in jail, in order to get the accused to settle for a lesser charge and avoid the expenses of a trial. But for this to work, they need to occasionally find someone who rejects the offer, put them on trial, and slap them with a ridiculous sentence. Therefore by standing up to them (or proclaiming in advance that you will reject such offers), you are not actually making yourself immune to their threats. Your setting yourself up to be the sacrificial one made an example of. Of course, if everyone were a UDT agent, the correct decision would be for everyone to reject the threat. That would ensure that the threats are never made in the first place. But - and apologies if this shocks you - not everyone in the world is a perfect UDT agent. So the threats will get made, and those resisting them will get slammed to the maximum. Of course, if everyone could read everyone's mind and was perfectly rational, then they would realise that making examples of UDT agents wouldn't affect the behaviour of non-UDT agents. In that case, UDT agents should resist the threats, and the perfectly rational prosecutor wouldn't bother threatening UDT agents. However - and sorry to shock your views of reality three times in one post - not everyone is perfectly rational. And not everyone can read everyone's minds. So even a perfect UDT agent must, it seems, sometimes succumb to blackmail. ## Anthropic decision theory for selfish agents 8 21 October 2014 03:56PM Consider Nick Bostrom's Incubator Gedankenexperiment, phrased as a decision problem. In my mind, this provides the purest and simplest example of a non-trivial anthropic decision problem. In an otherwise empty world, the Incubator flips a coin. If the coin comes up heads, it creates one human, while if the coin comes up tails, it creates two humans. Each created human is put into one of two indistinguishable cells, and there's no way for created humans to tell whether another human has been created or not. Each created human is offered the possibility to buy a lottery ticket which pays 1$ if the coin has shown tails. What is the maximal price that you would pay for such a lottery ticket? (Utility is proportional to Dollars.) The two traditional answers are 1/2$and 2/3$.

We can try to answer this question for agents with different utility functions: total utilitarians; average utilitarians; and selfish agents. UDT's answer is that total utilitarians should pay up to 2/3$, while average utilitarians should pay up to 1/2$; see Stuart Armstrong's paper and Wei Dai's comment. There are some heuristic ways to arrive at UDT prescpriptions, such as asking "What would I have precommited to?" or arguing based on reflective consistency. For example, a CDT agent that expects to face Counterfactual Mugging-like situations in the future (with predictions also made in the future) will self-modify to become an UDT agent, i.e., one that pays the counterfactual mugger.

Now, these kinds of heuristics are not applicable to the Incubator case. It is meaningless to ask "What maximal price should I have precommited to?" or "At what odds should I bet on coin flips of this kind in the future?", since the very point of the Gedankenexperiment is that the agent's existence is contingent upon the outcome of the coin flip. Can we come up with a different heuristic that leads to the correct answer? Imagine that the Incubator's subroutine that is responsible for creating the humans is completely benevolent towards them (let's call this the "Benevolent Creator"). (We assume here that the humans' goals are identical, such that the notion of benevolence towards all humans is completely unproblematic.) The Benevolent Creator has the power to program a certain maximal price the humans pay for the lottery tickets into them. A moment's thought shows that this leads indeed to UDT's answers for average and total utilitarians. For example, consider the case of total utilitarians. If the humans pay xfor the lottery tickets, the expected utility is 1/2*(-x) + 1/2*2*(1-x). So indeed, the break-even price is reached for x=2/3. But what about selfish agents? For them, the Benevolent Creator heuristic is no longer applicable. Since the humans' goals do not align, the Creator cannot share them. As Wei Dai writes, the notion of selfish values does not fit well with UDT. In Anthropic decision theory, Stuart Armstrong argues that selfish agents should pay up to 1/2 (Sec. 3.3.3). His argument is based on an alleged isomorphism between the average utilitarian and the selfish case. (For instance, donating 1$to each human increases utility by 1 for both average utilitarian and selfish agents, while it increases utility by 2 for total utilitarians in the tails world.) Here, I want to argue that this is incorrect and that selfish agents should pay up to 2/3$ for the lottery tickets.

(Needless to say that all the bold statements I'm about to make are based on an "inside view". An "outside view" tells me that Stuart Armstrong has thought much more carefully about these issues than I have, and has discussed them with a lot of smart people, which I haven't, so chances are my arguments are flawed somehow.)

In order to make my argument, I want to introduce yet another heuristic, which I call the Submissive Gnome. Suppose each cell contains a gnome which is already present before the coin is flipped. As soon as it sees a human in its cell, it instantly adopts the human's goal. From the gnome's perspective, SIA odds are clearly correct: Since a human is twice as likely to appear in the gnome's cell if the coin shows tails, Bayes' Theorem implies that the probability of tails is 2/3 from the gnome's perspective once it has seen a human. Therefore, the gnome would advise the selfish human to pay up to 2/3$for a lottery ticket that pays 1$ in the tails world. I don't see any reason why the selfish agent shouldn't follow the gnome's advice. From the gnome's perspective, the problem is not even "anthropic" in any sense, there's just straightforward Bayesian updating.

Suppose we want to use the Submissive Gnome heuristic to solve the problem for utilitarian agents. (ETA:
Total/average utilitarianism includes the well-being and population of humans only, not of gnomes.) The gnome reasons as follows: "With probability 2/3, the coin has shown tails. For an average utilitarian, the expected utility after paying x$for a ticket is 1/3*(-x)+2/3*(1-x), while for a total utilitarian the expected utility is 1/3*(-x)+2/3*2*(1-x). Average and total utilitarians should thus pay up to 2/3$ and 4/5, respectively." The gnome's advice disagrees with UDT and the solution based on the Benevolent Creator. Something has gone terribly wrong here, but what? The mistake in the gnome's reasoning here is in fact perfectly isomorphic to the mistake in the reasoning leading to the "yea" answer in Psy-Kosh's non-anthropic problem. Things become clear if we look at the problem from the gnome's perspective before the coin is flipped. Assume, for simplicity, that there are only two cells and gnomes, 1 and 2. If the coin shows heads, the single human is placed in cell 1 and cell 2 is left empty. Since the humans don't know in which cell they are, neither should the gnomes know. So from each gnome's perspective, there are four equiprobable "worlds": it can be in cell 1 or 2 and the coin flip can result in heads or tails. We assume, of course, that the two gnomes are, like the humans, sufficiently similar such that their decisions are "linked". We can assume that the gnomes already know what utility functions the humans are going to have. If the humans will be (total/average) utilitarians, we can then even assume that the gnomes already are so, too, since the well-being of each human is as important as that of any other. Crucially, then, for both utilitarian utility functions, the question whether the gnome is in cell 1 or 2 is irrelevant. There is just one "gnome advice" that is given identically to all (one or two) humans. Whether this advice is given by one gnome or the other or both of them is irrelevant from both gnomes' perspective. The alignment of the humans' goals leads to alignment of the gnomes' goals. The expected utility of some advice can simply be calculated by taking probability 1/2 for both heads and tails, and introducing a factor of 2 in the total utilitarian case, leading to the answers 1/2 and 2/3, in accordance with UDT and the Benevolent Creator. The situation looks different if the humans are selfish. We can no longer assume that the gnomes already have a utility function. The gnome cannot yet care about that human, since with probability 1/4 (if the gnome is in cell 2 and the coin shows heads) there will not be a human to care for. (By contrast, it is already possible to care about the average utility of all humans there will be, which is where the alleged isomorphism between the two cases breaks down.) It is still true that there is just one "gnome advice" that is given identically to all (one or two) humans, but the method for calculating the optimal advice now differs. In three of the four equiprobable "worlds" the gnome can live in, a human will appear in its cell after the coin flip. Two out of these three are tail worlds, so the gnome decides to advise paying up to 2/3 for the lottery ticket if a human appears in its cell.

There is a way to restore the equivalence between the average utilitarian and the selfish case. If the humans will be selfish, we can say that the gnome cares about the average well-being of the three humans which will appear in its cell with equal likelihood: the human created after heads, the first human created after tails, and the second human created after tails. The gnome expects to adopt each of these three humans' selfish utility function with probability 1/4. It makes thus sense to say that the gnome cares about the average well-being of these three humans. This is the correct correspondence between selfish and average utilitarian values and it leads, again, to the conclusion that the correct advise is to pay up to 2/3$for the lottery ticket. In Anthropic Bias, Nick Bostrom argues that each human should assign probability 1/2 to the coin having shown tails ("SSA odds"). He also introduces the possible answer 2/3 ("SSA+SIA", nowadays usually simply called "SIA") and refutes it. SIA odds have been defended by Olum. The main argument against SIA is the Presumptuous Philosopher. Main arguments for SIA and against SSA odds are that SIA avoids the Doomsday Argument1, which most people feel has to be wrong, that SSA odds depend on whom you consider to be part of your "reference class", and furthermore, as pointed out by Bostrom himself, that SSA odds allow for acausal superpowers. The consensus view on LW seems to be that much of the SSA vs. SIA debate is confused and due to discussing probabilities detached from decision problems of agents with specific utility functions. (ETA: At least this was the impression I got. Two commenters have expressed scepticism about whether this is really the consensus view.) I think that "What are the odds at which a selfish agent should bet on tails?" is the most sensible translation of "What is the probability that the coin has shown tails?" into a decision problem. Since I've argued that selfish agents should take bets following SIA odds, one can employ the Presumptuous Philosopher argument against my conclusion: it seems to imply that selfish agents, like total but unlike average utilitarians, should bet at extreme odds on living in a extremely large universe, even if there's no empirical evidence in favor of this. I don't think this counterargument is very strong. However, since this post is already quite lengthy, I'll elaborate more on this if I get encouraging feedback for this post. 1 At least its standard version. SIA comes with its own Doomsday conclusions, cf. Katja Grace's thesis Anthropic Reasoning in the Great Filter. ## Overcoming Decision Anxiety 14 11 September 2014 04:22AM ## I get pretty anxious about open-ended decisions. I often spend an unacceptable amount of time agonizing over things like what design options to get on a custom suit, or what kind of job I want to pursue, or what apartment I want to live in. Some of these decisions are obviously important ones, with implications for my future happiness. However, in general my sense of anxiety is poorly calibrated with the importance of the decision. This makes life harder than it has to be, and lowers my productivity. I moved apartments recently, and I decided that this would be a good time to address my anxiety about open-ended decisions. My hope is to present some ideas that will be helpful for others with similar anxieties, or to stimulate helpful discussion. ### Solutions Exposure therapy One promising way of dealing with decision anxiety is to practice making decisions without worrying about them quite so much. Match your clothes together in a new way, even if you're not 100% sure that you like the resulting outfit. Buy a new set of headphones, even if it isn't the “perfect choice.” Aim for good enough. Remind yourself that life will be okay if your clothes are slightly mismatched for one day. This is basically exposure therapy – exposing oneself to a slightly aversive stimulus while remaining calm about it. Doing something you're (mildly) afraid to do can have a tremendously positive impact when you try it and realize that it wasn't all that bad. Of course, you can always start small and build up to bolder activities as your anxieties diminish. For the past several months, I had been practicing this with small decisions. With the move approaching in July, I needed some more tricks for dealing with a bigger, more important decision. Reasoning with yourself It helps to think up reasons why your anxieties aren't justified. As in actual, honest-to-goodness reasons that you think are true. Check out this conversation between my System 1 and System 2 that happened just after my roommates and I made a decision on an apartment: System 1: Oh man, this neighborhood [the old neighborhood] is such a great place to go for walks. It's so scenic and calm. I'm going to miss that. The new neighborhood isn't as pretty. System 2: Well that's true, but how many walks did we actually take in five years living in the old neighborhood? If I recall correctly, we didn't even take two per year. System 1: Well, yeah... but... System 2: So maybe “how good the neighborhood is for taking walks” isn't actually that important to us. At least not to the extent that you're feeling. There were things that we really liked about our old living situation, but taking walks really wasn't one of them. System 1: Yeah, you may be right... Of course, this “conversation” took place after the decision had already been made. But making a difficult decision often entails second-guessing oneself, and this too can be a source of great anxiety. As in the above, I find that poking holes in my own anxieties really makes me feel better. I do this by being a good skeptic and turning on my critical thinking skills – only instead of, say, debunking an article on pseudoscience, I'm debunking my own worries about how bad things are going to be. This helps me remain calm. Re-calibration The last piece of this process is something that should help when making future decisions. I reasoned that if my System 1 feels anxiety about things that aren't very important – if it is, as I said, poorly calibrated – then I perhaps I can re-calibrate it. Before moving apartments, I decided to make predictions about what aspects of the new living situation would affect my happiness. “How good the neighborhood is for walks” may not be important to me, but surely there are some factors that are important. So I wrote down things that I thought would be good and bad about the new place. I also rated them on how good or bad I thought they would be. In several months, I plan to go back over that list and compare my predicted feelings to my actual feelings. What was I right about? This will hopefully give my System 1 a strong impetus to re-calibrate, and only feel anxious about aspects of a decision that are strongly correlated with my future happiness. ### Future Benefits I think we each carry in our heads a model of what is possible for us to achieve, and anxiety about the choices we make limits how bold we can be in trying new things. As a result, I think that my attempts to feel less anxiety about decisions will be very valuable to me, and allow me to do things that I couldn't do before. At the same time, I expect that making decisions of all kinds will be a quicker and more pleasant process, which is a great outcome in and of itself. ## Decision Theory: Value in Time 2 27 July 2014 10:01AM Summary: Is there demand for writing posts about this aspect of decision-making? And of course, is there offer? Because I didn't see any post about it. Topics I intended to cover include: • How much is worth 100$ in few years? Why? Why is it useful?
• Risk-return relationship.
• How is it useful in life outside finance?

And topic I would like, but I am not sure if i should cover:

• How can we apply it to death? (in sense, should I live a happy life or struggle to live endlessly?)

I found that missing in decision analysis, and I think it is very important thing to know, since we don't always choose between "I take A" or "I take B", but also between "I take A" or "I take B in two years", or "should i give A to gain B every year next 100 years?"

Why not simply redirect to some other source?

Well, that can be done either way, but I thought clear basics would not harm and would be useful to people who want to invest less time in it.

## Quantum Decisions

1 12 May 2014 09:49PM

CFAR sometimes plays a Monday / Tuesday game (invented by palladias). Copying from the URL:

On Monday, your proposition is true. On Tuesday, your proposition is false. Tell me a story about each of the days so I can see how they are different. Don't just list the differences (because you're already not doing that well). Start with "I wake up" so you start concrete and move on in that vein, naming the parts of your day that are identical as well as those that are different.

So my question is (edited on 2014/05/13):

On Monday, I make my decisions by rolling a normal die. Example: should I eat vanilla or chocolate ice-cream? I then decide that if I roll 4 or higher on a 6-sided die, I'll pick vanilla. I roll the die, get a 3, and so proceed to eat chocolate ice-cream.

On Tuesday, I use the same procedure, but use a quantum random number generator instead. (For the purpose of this discussion, let's assume that I can actually find a true/reliable generator. May be I'm shooting a photon through a half-silvered mirror.)

What's the difference? (Relevant discussion pointed out Pfft.)

## Mutual Worth without default point (but with potential threats)

6 31 July 2013 09:52AM

Though I planned to avoid posting anything more until well after baby, I found this refinement to MWBS yesterday, so I'm posting it while Miriam sleeps during a pause in contractions.

The mutual worth bargaining solution was built from the idea that the true value of a trade is having your utility function access the decision points of the other player. This gave the idea of utopia points: what happens when you are granted complete control over the other person's decisions. This gave a natural 1 to normalise your utility function. But the 0 point is chosen according to a default point. This is arbitrary, and breaks the symmetry between the top and bottom point of the normalisation.

We'd also want normalisations that function well when players have no idea what their opponents will be. This includes not knowing what their utility functions will be. Can we model what a 'generic' opposing utility function would be?

It's tricky, in general, to know what 'value' to put on an opponent's utility function. It's unclear what kind of utilities would you like to see them have? That's because game theory comes into play, with Nash equilibriums, multiple solution concepts, bargaining and threats: there is no universal default to the result of a game between two agents. There are two situations, however, that are respectively better and worse than all others: the situation where your opponent shares your exact utility function, and the situations where they have the negative of that (they're essentially your 'anti-agent').

If your opponent shares your utility function, then there is a clear ideal outcome: act as if you and the opponent were the same person, acting to maximise your joint utility. This is the utopia point for MWBS, which can be standardised to take value 1.

If your opponent has the negative of your utility, then the game is zero-sum: any gain to you is a loss to your opponent, and there is no possibility for mutually pleasing compromise. But zero-sum games also have a single canonical outcome! For zero-sum games, the concepts of Nash equilibrium, minimax, and maximin are all equivalent (and are generally mixed outcomes). The game has a single defined value: each player can guarantee they get as much utility as that value, and the other player can guarantee that they get no more.

It seems natural to normalise that point to -1 (0 would be equivalent, but -1 feels more appropriate). Given this normalisation for each utility, the two utilities can then be summed and joint maximised in the usual way.

This bargaining solution has a lot of attractive features - it's symmetric in minimal and maximal utilities, does not require a default point, reflects the relative power, and captures the spread of opponents utilities that could be encountered without needing to go into game theory. It is vulnerable to (implicit) threats, however! If I can (potentially) cause a lot of damage to you and your cause, then when you normalise your utility, you get penalised because of what your anti-agent could do if they controlled my decision nodes. So just by having the power do do bad stuff to you, I come out better than I would otherwise (and vice-versa, of course).

I feel it's worth exploring further (especially what happens with multiple agents) - but for me, after the baby.

## Why one-box?

7 30 June 2013 02:38AM

I have sympathy with both one-boxers and two-boxers in Newcomb's problem. Contrary to this, however, many people on Less Wrong seem to be staunch and confident one-boxers. So I'm turning to you guys to ask for help figuring out whether I should be a staunch one-boxer too. Below is an imaginary dialogue setting out my understanding of the arguments normally advanced on LW for one-boxing and I was hoping to get help filling in the details and extending this argument so that I (and anyone else who is uncertain about the issue) can develop an understanding of the strongest arguments for one-boxing.

One-boxer: You should one-box because one-boxing wins (that is, a person that one-boxes ends up better off than a person that two-boxes). Not only does it seem clear that rationality should be about winning generally (that a rational agent should not be systematically outperformed by irrational agents) but Newcomb's problem is normally discussed within the context of instrumental rationality, which everyone agrees is about winning.

Me: I get that and that's one of the main reasons I'm sympathetic to the one-boxing view but the two-boxers has a response to these concerns. The two-boxer agrees that rationality is about winning and they agree that winning means ending up with the most utility. The two-boxer should also agree that the rational decision theory to follow is one that will one-box on all future Newcomb's problems (those where the prediction has not yet occurred) and can also agree that the best timeless agent type is a one-boxing type. However, the two-boxer also claims that two-boxing is the rational decision.

O: Sure, but why think they're right? After all, two-boxers don't win.

M: Okay, those with a two-boxing agent type don't win but the two-boxer isn't talking about agent types. They're talking about decisions. So they are interested in what aspects of the agent's winning can be attributed to their decision and they say that we can attribute the agent's winning to their decision if this is caused by their decision. This strikes me as quite a reasonable way to apportion the credit for various parts of the winning. (Of course, it could be said that the two-boxer is right but they are playing a pointless game and should instead be interested in winning simpliciter rather than winning decisions. If this is the claim then the argument is dissolved and there is no disagreement. But I take it this is not the claim).

O: But this is a strange convoluted definition of winning. The agent ends up worse off than one-boxing agents so it must be a convoluted definition of winning that says that two-boxing is the winning decision.

M: Hmm, maybe... But I'm worried that relevant distinctions aren't being made here (you've started talking about winning agents rather than winning decisions). The two-boxer relies on the same definition of winning as you and so agrees that the one-boxing agent is the winning agent. They just disagree about how to attribute winning to the agent's decisions (rather than to other features of the agent). And their way of doing this strikes me as quite a natural one. We credit the decision with the winning that it causes. Is this the source of my unwillingness to jump fully on board with your program? Do we simply disagree about the plausibility of this way of attributing winning to decisions?

Meta-comment (a): I don't know what to say here? Is this what's going on? Do people just intuitively feel that this is a crazy way to attribute winning to decisions? If so, can anyone suggest why I should adopt the one-boxer perspective on this?

O: But then the two-boxer has to rely on the claim that Newcomb's problem is "unfair" to explain why the two-boxing agent doesn't win. It seems absurd to say that a scenario like Newcomb's problem is unfair.

M: Well, the two-boxing agent means something very particular by "unfair". They simply mean that in this case the winning agent doesn't correspond to the winning decision. Further, they can explain why this is the case without saying anything that strikes me as crazy. They simply say that Newcomb's problem is a case where the agent's winnings can't entirely be attributed to the agent's decision (ignoring a constant value). But if something else (the agent's type at time of prediction) also influences the agent's winning in this case, why should it be a surprise that the winning agent and the winning decision come apart? I'm not saying the two-boxer is right here but they don't seem to me to be obviously wrong either...

Meta-comment (b): Interested to know what response should be given here.

O: Okay, let's try something else. The two-boxer focuses only on causal consequences but in doing so they simply ignore all the logical non-causal consequences of their decision algorithm outputting a certain decision. This is an ad hoc, unmotivated restriction.

M: Ah hoc? I'm not sure I see why. Think about the problem with evidential decision theory. The proponent of EDT could say a similar thing (that the proponent of two-boxing ignores all the evidential implications of their decision). The two-boxer will respond that these implications just are not relevant to decision making. When we make decisions we are trying to bring about the best results, not get evidence for these results. Equally, they might say, we are trying to bring about the best results, not derive the best results in our logical calculations. Now I don't know what to make of the point/counter-point here but it doesn't seem to me that the one-boxing view is obviously correct here and I'm worried that we're again going to end up just trading intuitions (and I can see the force of both intuitions here).

Meta-comment: Again, I would love to know whether I've understood this argument and whether something can be said to convince me that the one-boxing view is the clear cut winner here.

End comments: That's my understanding of the primary argument advanced for one-boxing on LW. Are there other core arguments? How can these arguments be improved and extended?

## Other prespective on resolving the Prisoner's dilemma

11 04 June 2013 04:13PM

Sometimes I see new ideas that, without offering any new information, offers a new perspective on old information, and a new way of thinking about an old problem. So it is with this lecture and the prisoner's dilemma.

Now, I worked a lot with the prisoners dilemma, with superrationality, negotiations, fairness, retaliation, Rawlsian veils of ignorance, etc. I've studied the problem, and its possible resolutions, extensively. But the perspective of that lecture was refreshing and new to me:

The prisoner's dilemma is resolved only when the off-diagonal outcomes of the dilemma are known to be impossible.

The "off-diagonal outcomes" are the "(Defect, Cooperate)" and the "(Cooperate, Defect)" squares where one person walks away with all the benefit and the other has none:

(Baron, Countess)
Cooperate
Defect
Cooperate
(3,3) (0,5)
Defect
(5,0) (1,1)

Facing an identical (or near identical) copy of yourself? Then the off-diagonal outcomes are impossible, because you're going to choose the same thing. Facing Tit-for-tat in an iterated prisoner's dilemma? Well, the off-diagonal squares cannot be reached consistently. Is the other prisoner a Mafia don? Then the off-diagonal outcomes don't exist as written: there's a hidden negative term (you being horribly murdered) that isn't taken into account in that matrix. Various agents with open code are essentially publicly declaring the conditions under which they will not reach for the off-diagonal. The point of many contracts and agreements is to make the off-diagonal outcome impossible or expensive.

As I said, nothing fundamentally new, but I find the perspective interesting. To my mind, it suggests that when resolving the prisoner's dilemma with probabilistic outcomes allowed, I should be thinking "blocking off possible outcomes", rather than "reaching agreement".

## Crash problems for total futarchy

6 15 May 2013 10:41AM

Futarchy holds great promise for dealing with all the morass of poor decision making in our governments and corporations. For those who haven't heard of it, the main concept is to use betting markets, where people place bets on the expected outcome of a policy, and the decision-makers choose the policy that the market decrees is most likely to achieve their desired outcomes. Robin Hanson summarises it as "Vote Values, But Bet Beliefs".

The approach, however, could lead to problems in a large financial crisis. When a large financial bubble bursts, many things change: liquidity, risk aversion, volatility, the competence of the average investor. If the betting markets are integrated into the general market (which they would be), then they would be affected in the same way. So at precisely the moment when decision makers need the best results, their main tools would be going haywire.

This would be even worse if they'd been depending on the betting markets for their decisions, operating merely as overseers. At that point, they may have lost the ability to make effective decision entirely.

Since isolating the betting markets from the swings of the rest of the market is unrealistic/impossible/stupid, we should aim for a mixed governance model - one where betting markets play an integral part, but where the deciders still have experience making their own decisions and overriding the betting markets with some regularity.

## [LINK] Antibiotic seemingly improves decision-making in the presence of attractive women.

5 07 May 2013 02:26PM

This study seems to show that minocycline helps to resist placing too much trust in attractive women.

Recently, minocycline, a tetracycline antibiotic, has been reported to improve symptoms of psychiatric disorders and to facilitate sober decision-making in healthy human subjects. Here we show that minocycline also reduces the risk of the 'honey trap' during an economic exchange. Males tend to cooperate with physically attractive females without careful evaluation of their trustworthiness, resulting in betrayal by the female. In this experiment, healthy male participants made risky choices (whether or not to trust female partners, identified only by photograph, who had decided in advance to exploit the male participants). The results show that trusting behaviour in male participants significantly increased in relation to the perceived attractiveness of the female partner, but that attractiveness did not impact trusting behaviour in the minocycline group. Animal studies have shown that minocycline inhibits microglial activities. Therefore, this minocycline effect may shed new light on the unknown roles microglia play in human mental activities.

## Three more ways identity can be a curse

40 28 April 2013 02:53AM

The Buddhists believe that one of the three keys to attaining true happiness is dissolving the illusion of the self. (The other two are dissolving the illusion of permanence, and ceasing the desire that leads to suffering.) I'm not really sure exactly what it means to say "the self is an illusion", and I'm not exactly sure how that will lead to enlightenment, but I do think one can easily take the first step on this long journey to happiness by beginning to dissolve the sense of one's identity.

Previously, in "Keep Your Identity Small", Paul Graham showed how a strong sense of identity can lead to epistemic irrationally, when someone refuses to accept evidence against x because "someone who believes x" is part of his or her identity. And in Kaj Sotala's "The Curse of Identity", he illustrated a human tendency to reinterpret a goal of "do x" as "give the impression of being someone who does x". These are both fantastic posts, and you should read them if you haven't already.

Here are three more ways in which identity can be a curse.

1. Don't be afraid to change

James March, professor of political science at Stanford University, says that when people make choices, they tend to use one of two basic models of decision making: the consequences model, or the identity model. In the consequences model, we weigh the costs and benefits of our options and make the choice that maximizes our satisfaction. In the identity model, we ask ourselves "What would a person like me do in this situation?"1

The author of the book I read this in didn't seem to take the obvious next step and acknowledge that the consequences model is clearly The Correct Way to Make Decisions and basically by definition, if you're using the identity model and it's giving you a different result then the consequences model would, you're being led astray. A heuristic I like to use is to limit my identity to the "observer" part of my brain, and make my only goal maximizing the amount of happiness and pleasure the observer experiences, and minimizing the amount of misfortune and pain. It sounds obvious when you lay it out in these terms, but let me give an example.

Alice is a incoming freshman in college trying to choose her major. In Hypothetical University, there are only two majors: English, and business. Alice absolutely adores literature, and thinks business is dreadfully boring. Becoming an English major would allow her to have a career working with something she's passionate about, which is worth 2 megautilons to her, but it would also make her poor (0 mu). Becoming a business major would mean working in a field she is not passionate about (0 mu), but it would also make her rich, which is worth 1 megautilon. So English, with 2 mu, wins out over business, with 1 mu.

However, Alice is very bright, and is the type of person who can adapt herself to many situations and learn skills quickly. If Alice were to spend the first six months of college deeply immersing herself in studying business, she would probably start developing a passion for business. If she purposefully exposed herself to certain pro-business memeplexes (e.g. watched a movie glamorizing the life of Wall Street bankers), then she could speed up this process even further. After a few years of taking business classes, she would probably begin to forget what about English literature was so appealing to her, and be extremely grateful that she made the decision she did. Therefore she would gain the same 2 mu from having a job she is passionate about, along with an additional 1 mu from being rich, meaning that the 3 mu choice of business wins out over the 2 mu choice of English.

However, the possibility of self-modifying to becoming someone who finds English literature boring and business interesting is very disturbing to Alice. She sees it as a betrayal of everything that she is, even though she's actually only been interested in English literature for a few years. Perhaps she thinks of choosing business as "selling out" or "giving in". Therefore she decides to major in English, and takes the 2 mu choice instead of the superior 3 mu.

(Obviously this is a hypothetical example/oversimplification and there are a lot of reasons why it might be rational to pursue a career path that doesn't make very much money.)

It seems to me like human beings have a bizarre tendency to want to keep certain attributes and character traits stagnant, even when doing so provides no advantage, or is actively harmful. In a world where business-passionate people systematically do better than English-passionate people, it makes sense to self-modify to become business-passionate. Yet this is often distasteful.

For example, until a few weeks ago when I started solidifying this thinking pattern, I had an extremely adverse reaction to the idea of ceasing to be a hip-hop fan and becoming a fan of more "sophisticated" musical genres like jazz and classical, eventually coming to look down on the music I currently listen to as primitive or silly. This doesn't really make sense - I'm sure if I were to become a jazz and classical fan I would enjoy those genres at least as much as I currently enjoy hip hop. And yet I had a very strong preference to remain the same, even in the trivial realm of music taste.

Probably the most extreme example is the common tendency for depressed people to not actually want to get better, because depression has become such a core part of their identity that the idea of becoming a healthy, happy person is disturbing to them. (I used to struggle with this myself, in fact.) Being depressed is probably the most obviously harmful characteristic that someone can have, and yet many people resist self-modification.

Of course, the obvious objection is there's no way to rationally object to people's preferences - if someone truly prioritizes keeping their identity stagnant over not being depressed then there's no way to tell them they're wrong, just like if someone prioritizes paperclips over happiness there's no way to tell them they're wrong. But if you're like me, and you are interested in being happy, then I recommend looking out for this cognitive bias.

The other objection is that this philosophy leads to extremely unsavory wireheading-esque scenarios if you take it to its logical conclusion. But holding the opposite belief - that it's always more important to keep your characteristics stagnant than to be happy - clearly leads to even more absurd conclusions. So there is probably some point on the spectrum where change is so distasteful that it's not worth a boost in happiness (e.g. a lobotomy or something similar). However, I think that in actual practical pre-Singularity life, most people set this point far, far too low.

2. The hidden meaning of "be yourself"

(This section is entirely my own speculation, so take it as you will.)

"Be yourself" is probably the most widely-repeated piece of social skills advice despite being pretty clearly useless - if it worked then no one would be socially awkward, because everyone has heard this advice.

However, there must be some sort of core grain of truth in this statement, or else it wouldn't be so widely repeated. I think that core grain is basically the point I just made, applied to social interaction. I.e, optimize always for social success and positive relationships (particularly in the moment), and not for signalling a certain identity.

The ostensible purpose of identity/signalling is to appear to be a certain type of person, so that people will like and respect you, which is in turn so that people will want to be around you and be more likely to do stuff for you. However, oftentimes this goes horribly wrong, and people become very devoted to cultivating certain identities that are actively harmful for this purpose, e.g. goth, juggalo, "cool reserved aloof loner", guy that won't shut up about politics, etc. A more subtle example is Fred, who holds the wall and refuses to dance at a nightclub because he is a serious, dignified sort of guy, and doesn't want to look silly. However, the reason why "looking silly" is generally a bad thing is because it makes people lose respect for you, and therefore make them less likely to associate with you. In the situation Fred is in, holding the wall and looking serious will cause no one to associate with him, but if he dances and mingles with strangers and looks silly, people will be likely to associate with him. So unless he's afraid of looking silly in the eyes of God, this seems to be irrational.

Probably more common is the tendency to go to great care to cultivate identities that are neither harmful nor beneficial. E.g. "deep philosophical thinker", "Grateful Dead fan", "tough guy", "nature lover", "rationalist", etc. Boring Bob is a guy who wears a blue polo shirt and khakis every day, works as hard as expected but no harder in his job as an accountant, holds no political views, and when he goes home he relaxes by watching whatever's on TV and reading the paper. Boring Bob would probably improve his chances of social success by cultivating a more interesting identity, perhaps by changing his wardrobe, hobbies, and viewpoints, and then liberally signalling this new identity. However, most of us are not Boring Bob, and a much better social success strategy for most of us is probably to smile more, improve our posture and body language, be more open and accepting of other people, learn how to make better small talk, etc. But most people fail to realize this and instead play elaborate signalling games in order to improve their status, sometimes even at the expense of lots of time and money.

Some ways by which people can fail to "be themselves" in individual social interactions: liberally sprinkle references to certain attributes that they want to emphasize, say nonsensical and surreal things in order to seem quirky, be afraid to give obvious responses to questions in order to seem more interesting, insert forced "cool" actions into their mannerisms, act underwhelmed by what the other person is saying in order to seem jaded and superior, etc. Whereas someone who is "being herself" is more interested in creating rapport with the other person than giving off a certain impression of herself.

Additionally, optimizing for a particular identity might not only be counterproductive - it might actually be a quick way to get people to despise you.

I used to not understand why certain "types" of people, such as "hipsters"2 or Ed Hardy and Affliction-wearing "douchebags" are so universally loathed (especially on the internet). Yes, these people are adopting certain styles in order to be cool and interesting, but isn't everyone doing the same? No one looks through their wardrobe and says "hmm, I'll wear this sweater because it makes me uncool, and it'll make people not like me". Perhaps hipsters and Ed Hardy Guys fail in their mission to be cool, but should we really hate them for this? If being a hipster was cool two years ago, and being someone who wears normal clothes, acts normal, and doesn't do anything "ironically" is cool today, then we're really just hating people for failing to keep up with the trends. And if being a hipster actually is cool, then, well, who can fault them for choosing to be one?

That was my old thought process. Now it is clear to me that what makes hipsters and Ed Hardy Guys hated is that they aren't "being themselves" - they are much more interested in cultivating an identity of interestingness and masculinity, respectively, than connecting with other people. The same thing goes for pretty much every other collectively hated stereotype I can think of3 - people who loudly express political opinions, stoners who won't stop talking about smoking weed, attention seeking teenage girls on facebook, extremely flamboyantly gay guys, "weeaboos", hippies and new age types, 2005 "emo kids", overly politically correct people, tumblr SJA weirdos who identify as otherkin and whatnot, overly patriotic "rednecks", the list goes on and on.

This also clears up a confusion that occurred to me when reading How to Win Friends and Influence People. I know people who have a Dale Carnegie mindset of being optimistic and nice to everyone they meet and are adored for it, but I also know people who have the same attitude and yet are considered irritatingly saccharine and would probably do better to "keep it real" a little. So what's the difference? I think the difference is that the former group are genuinely interested in being nice to people and building rapport, while members of the second group have made an error like the one described in Kaj Sotala's post and are merely trying to give off the impression of being a nice and friendly person. The distinction is obviously very subtle, but it's one that humans are apparently very good at perceiving.

I'm not exactly sure what it is that causes humans to have this tendency of hating people who are clearly optimizing for identity - it's not as if they harm anyone. It probably has to do with tribal status. But what is clear is that you should definitely not be one of them.

3. The worst mistake you can possibly make in combating akrasia

The main thesis of PJ Eby's Thinking Things Done is that the primary reason why people are incapable of being productive is that they use negative motivation ("if I don't do x, some negative y will happen") as opposed to positive motivation ("if i do x, some positive y will happen"). He has the following evo-psych explanation for this: in the ancestral environment, personal failure meant that you could possibly be kicked out of your tribe, which would be fatal. A lot of depressed people make statements like "I'm worthless", or "I'm scum" or "No one could ever love me", which are illogically dramatic and overly black and white, until you realize that these statements are merely interpretations of a feeling of "I'm about to get kicked out of the tribe, and therefore die." Animals have a freezing response to imminent death, so if you are fearing failure you will go into do-nothing mode and not be able to work at all.4

In Succeed: How We Can Reach Our Goals, Phd psychologist Heidi Halvorson takes a different view and describes positive motivation and negative motivation as having pros and cons. However, she has her own dichotomy of Good Motivation and Bad Motivation: "Be good" goals are performance goals, and are directed at achieving a particular outcome, like getting an A on a test, reaching a sales target, getting your attractive neighbor to go out with you, or getting into law school. They are very often tied closely to a sense of self-worth. "Get better" goals are mastery goals, and people who pick these goals judge themselves instead in terms of the progress they are making, asking questions like "Am I improving? Am I learning? Am I moving forward at a good pace?" Halvorson argues that "get better" goals are almost always drastically better than "be good" goals5. An example quote (from page 60) is:

When my goal is to get an A in a class and prove that I'm smart, and I take the first exam and I don't get an A... well, then I really can't help but think that maybe I'm not so smart, right? Concluding "maybe I'm not smart" has several consequences and none of them are good. First, I'm going to feel terrible - probably anxious and depressed, possibly embarrassed or ashamed. My sense of self-worth and self-esteem are going to suffer. My confidence will be shaken, if not completely shattered. And if I'm not smart enough, there's really no point in continuing to try to do well, so I'll probably just give up and not bother working so hard on the remaining exams.

And finally, in Feeling Good: The New Mood Therapy, David Burns describes a destructive side effect of depression he calls "do-nothingism":

One of the most destructive aspects of depression is the way it paralyzes your willpower. In its mildest form you may simply procrastinate about doing a few odious chores. As your lack of motivation increases, virtually any activity appears so difficult that you become overwhelmed by the urge to do nothing. Because you accomplish very little, you feel worse and worse. Not only do you cut yourself off from your normal sources of stimulation and pleasure, but your lack of productivity aggravates your self-hatred, resulting in further isolation and incapacitation.

Synthesizing these three pieces of information leads me to believe that the worst thing you can possibly do for your akrasia is to tie your success and productivity to your sense of identity/self-worth, especially if you're using negative motivation to do so, and especially if you suffer or have recently suffered from depression or low-self esteem. The thought of having a negative self-image is scary and unpleasant, perhaps for the evo-psych reasons PJ Eby outlines. If you tie your productivity to your fear of a negative self-image, working will become scary and unpleasant as well, and you won't want to do it.

I feel like this might be the single number one reason why people are akratic. It might be a little premature to say that, and I might be biased by how large of a factor this mistake was in my own akrasia. But unfortunately, this trap seems like a very easy one to fall into. If you're someone who is lazy and isn't accomplishing much in life, perhaps depressed, then it makes intuitive sense to motivate yourself by saying "Come on, self! Do you want to be a useless failure in life? No? Well get going then!" But doing so will accomplish the exact opposite and make you feel miserable.

So there you have it. In addition to making you a bad rationalist and causing you to lose sight of your goals, a strong sense of identity will cause you to make poor decisions that lead to unhappiness, be unpopular, and be unsuccessful. I think the Buddhists were onto something with this one, personally, and I try to limit my sense of identity as much as possible. A trick you can use in addition to the "be the observer" trick I mentioned, is to whenever you find yourself thinking in identity terms, swap out that identity for the identity of "person who takes over the world by transcending the need for a sense of identity".

This is my first LessWrong discussion post, so constructive criticism is greatly appreciated. Was this informative? Or was what I said obvious, and I'm retreading old ground? Was this well written? Should this have been posted to Main? Should this not have been posted at all? Thank you.

1. Paraphrased from page 153 of Switch: How to Change When Change is Hard

2. Actually, while it works for this example, I think the stereotypical "hipster" is a bizarre caricature that doesn't match anyone who actually exists in real life, and the degree to which people will rabidly espouse hatred for this stereotypical figure (or used to two or three years ago) is one of the most bizarre tendencies people have.

3. Other than groups that arguably hurt people (religious fundamentalists, PUAs), the only exception I can think of is frat boy/jock types. They talk about drinking and partying a lot, sure, but not really any more than people who drink and party a lot would be expected to. Possibilities for their hated status include that they do in fact engage in obnoxious signalling and I'm not aware of it, jealousy, or stigmatization as hazers and date rapists. Also, a lot of people hate stereotypical "ghetto" black people who sag their jeans and notoriously type in a broken, difficult-to-read form of English. This could either be a weak example of the trend (I'm not really sure what it is they would be signalling, maybe dangerous-ness?), or just a manifestation of racism.

4. I'm not sure if this is valid science that he pulled from some other source, or if he just made this up.

5. The exception is that "be good" goals can lead to a very high level of performance when the task is easy.

## Is protecting yourself from your own biases self-defeating?

0 [deleted] 15 February 2013 02:21PM

I graduated from high school and wish to further my education formally by studying for a bachelor's degree in order to become a medical researcher. I could, for instance, take two different academic paths:

1. Study Medicine at undergraduate level and then do a postdoctoral fellowship.

2. Study Biochemistry at undergraduate level, then study for a PhD at graduate level, and finally do a postdoctoral fellowship.

Since I will do these studies in Europe, they each take approximately the same amount of time, namely 6 to 8 years.

Do I want to do treat patients? No, I do not. But I am considering Medicine because it can be a buffer against my own mediocrity: in case I turn out to be a below average scientist, I will be screwed royally. From my personal job shadowing experience, Medicine, on the other hand, requires mere basic intellectual traits, primarily the ability to memorize heaps of information. And those I think I have. To do world-class research though I'd have to be an intellectual heavyweight, and of that I'm not so sure.

How do I decide what path to  follow?

The reason I'm asking you strangers for advice is because I evidently have biases, such as the pessimism/optimism bias or the DunningKruger effect, that impair my ability to reason clearly; and people who know me personally are likewise prone to make errors in advising me because of biases like, say, the Halo effect. (Come to think of it, thinking that I can't become an above average scientist is in itself a self-defeating prophecy!)

Do you think that one ought to always seek advice from total strangers in order to be safeguarded from his/her own biases?

PS: I apologize if I should have written this in a specific thread. I'll delete my article if that's necessary.

## Simulating Problems

1 30 January 2013 01:14PM

Apologies for the rather mathematical nature of this post, but it seems to have some implications for topics relevant to LW. Prior to posting I looked for literature on this but was unable to find any; pointers would be appreciated.

In short, my question is: How can we prove that any simulation of a problem really simulates the problem?

I want to demonstrate that this is not as obvious as it may seem by using the example of Newcomb's Problem. The issue here is of course Omega's omniscience. If we construct a simulation with the rules (payoffs) of Newcomb, an Omega that is always right, and an interface for the agent to interact with the simulation, will that be enough?

Let's say we simulate Omega's prediction by a coin toss and repeat the simulation (without payoffs) until the coin toss matches the agent's decision. This seems to adhere to all specifications of Newcomb and is (if the coin toss is hidden) in fact indistinguishable from it from the agent's perspective. However, if the agent knows how the simulation works, a CDT agent will one-box, while it is assumed that the same agent would two-box in 'real' Newcomb. Not telling the agent how the simulation works is never a solution, so this simulation appears to not actually simulate Newcomb.

Pointing out differences is of course far easier than proving that none exist. Assuming there's a problem we have no idea which decisions agents would make, and we want to build a real-world simulation to find out exactly that. How can we prove that this simulation really simulates the problem?

(Edit: Apparently it wasn't apparent that this is about problems in terms of game theory and decision theory. Newcomb, Prisoner's Dilemma, Iterated Prisoner's Dilemma, Monty Hall, Sleeping Beauty, Two Envelopes, that sort of stuff. Should be clear now.)

## Why (anthropic) probability isn't enough

19 13 December 2012 04:09PM

A technical report of the Future of Humanity Institute (authored by me), on why anthropic probability isn't enough to reach decisions in anthropic situations. You also have to choose your decision theory, and take into account your altruism towards your copies. And these components can co-vary while leaving your ultimate decision the same - typically, EDT agents using SSA will reach the same decisions as CDT agents using SIA, and altruistic causal agents may decide the same way as selfish evidential agents.

## Anthropics: why probability isn't enough

This paper argues that the current treatment of anthropic and self-locating problems over-emphasises the importance of anthropic probabilities, and ignores other relevant and important factors, such as whether the various copies of the agents in question consider that they are acting in a linked fashion and whether they are mutually altruistic towards each other. These issues, generally irrelevant for non-anthropic problems, come to the forefront in anthropic situations and are at least as important as the anthropic probabilities: indeed they can erase the difference between different theories of anthropic probability, or increase their divergence. These help to reinterpret the decisions, rather than probabilities, as the fundamental objects of interest in anthropic problems.

## What's Wrong with Evidential Decision Theory?

16 23 August 2012 12:09AM

With all the exotic decision theories floating around here, it doesn't seem like anyone has tried to defend boring old evidential decision theory since AlexMennen last year.  So I thought I'd take a crack at it.  I might come off a bit more confident than I am, since I'm defending a minority position (I'll leave it to others to bring up objections).  But right now, I really do think that naive EDT, the simplest decision theory, is also the best decision theory.

Everyone agrees that Smoker's lesion is a bad counterexample to EDT, since it turns out that smoking actually does cause cancer.  But people seem to think that this is just an unfortunate choice of thought experiment, and that the reasoning is sound if we accept its premise.  I'm not so convinced.  I think that this "bad example" provides a pretty big clue as to what's wrong with the objections to EDT.  (After all, does anyone think it would have been irrational to quit smoking, based only on the correlation between smoking and cancer, before randomized controlled trials were conducted?)  I'll explain what I mean with the simplest version of this thought experiment I could come up with.

Suppose that I'm a farmer, hoping it will rain today, to water my crops.  I know that the probability of it having rained today, given that my lawn is wet, is higher than otherwise.  And I know that my lawn will be wet, if I turn my sprinklers on.  Of course, though it waters my lawn, running my sprinklers does nothing for my crops out in the field.  Making the ground wet doesn't cause rain; it's the other way around.  But if I'm an EDT agent, I know nothing of causation, and base my decisions only on conditional probability.  According to the standard criticism of EDT, I stupidly turn my sprinklers on, as if that would make it rain.

Here is where I think the criticism of EDT fails: how do I know, in the first place, that the ground being wet doesn't cause it to rain?  One obvious answer is that I've tried it, and observed that the probability of it raining on a given day, given that I turned my sprinklers on, isn't any higher than the prior probability.  But if I know that, then, as an evidential decision theorist, I have no reason to turn the sprinklers on.  However, if all I know about the world I inhabit are the two facts: (1) the probability of rain is higher, given that the ground is wet, and (2) The probability of the ground being wet is higher, given that I turn the sprinklers on - then turning the sprinklers on really is the rational thing to do, if I want it to rain.

This is more clear written symbolically.  If O is the desired Outcome (rain), E is the Evidence (wet ground), and A is the Action (turning on sprinklers), then we have:

• P(O|E) > P(O), and
• P(E|A) > P(E)

(In this case, A implies E, meaning P(E|A) = 1)

It's still possible that P(O|A) = P(O).  Or even that P(O|A) < P(O).  (For example, the prior probability of rolling a 4 with a fair die is 1/6.  Whereas the probability of rolling a 4, given that you rolled an even number, is 1/3.  So P(4|even) > P(4).  And you'll definitely roll an even number if you roll a 2, since 2 is even.  So P(even|2) > P(even).  But the probabilty of rolling a 4, given that you roll a 2, is zero, since 4 isn't 2.  So P(4|2) < P(4) even though P(4|even) > P(4) and P(even|2) > P(even).)  But in this problem, I don't know P(O|A) directly.  The best I can do is guess that, since A implies E, therefore P(O|A) = P(O|E) > P(O).  So I do A, to make O more likely.  But if I happened to know that P(O|A) = P(O), then I'd have no reason to do A.

Of course, "P(O|A) = P(O)" is basically what we mean, when we say that the ground being wet doesn't cause it to rain.  We know that making the ground wet (by means other than rain) doesn't make rain any more likely, either because we've observed this directly, or because we can infer it from our model of the world built up from countless observations.  The reason that EDT seems to give the wrong answer to this problem is because we know extra facts about the world, that we haven't stipulated in the problem.  But EDT gives the correct answer to the problem as stated.  It does the best it can do (the best anyone could do) with limited information.

This is the lesson we should take from Smoker's lesion.  Yes, from the perspective of people 60 years ago, it's possible that smoking doesn't cause cancer, and rather a third factor predisposes people to both smoking and cancer.  But it's also possible that there's a third factor which does the opposite: making people smoke and protecting them from cancer - but smokers are still more likely to get cancer, because smoking is so bad that it outweighs this protective effect.  In the absense of evidence one way or the other, the prudent choice is to not smoke.

But if we accept the premise of Smoker's lesion: that smokers are more likely to get cancer, only because people genetically predisposed to like smoking are also genetically predisposed to develop cancer - then EDT still gives us the right answer.  Just as with the Sprinkler problem above, we know that P(O|E) > P(O), and P(E|A) > P(E), where O is the desired outcome of avoiding cancer, E is the evidence of not smoking, and A is the action of deciding to not smoke for the purpose of avoiding cancer.  But we also just happen to know, by hypothesis, that P(O|A) = P(O).  Recognizing A and E as distinct is key, because one of the implications of the premise is that people who stop smoking, despite enjoying smoking, fair just as badly as life-long smokers.  So the reason that you choose to not smoke matters.  If you choose to not smoke, because you can't stand tobacco, it's good news.  But if you choose to not smoke to avoid cancer, it's neutral news.  The bottom line is that you, as an evidential decision theorist, should not take cancer into account when deciding whether or not to smoke, because the good news that you decided to not smoke, would be cancelled out by the fact that you did it to avoid cancer.

If this is starting to sound like the tickle defense, rest assured that there is no way to use this kind of reasoning to justify defecting on the Prisoner's dilemma or two-boxing on Newcomb's problem.  The reason is that, if you're playing against a copy of yourself in Prisoner's dilemma, it doesn't matter why you decide to do what you do.  Because, whatever your reasons are, your duplicate will do the same thing for the same reasons.  Similarly, you only need to know that the predictor is accurate in Newcomb's problem, in order for one-boxing to be good news.  The predictor might have blind spots that you could exploit, in order to get all the money.  But unless you know about those exceptions, your best bet is to one-box.  It's only in special cases that your motivation for making a decision can cancel out the auspiciousness of the decision.

The other objection to EDT is that it's temporally inconsistent.  But I don't see why that can't be handled with precommitments, because EDT isn't irreparably broken like CDT is.  A CDT agent will one-box on Newcomb's problem, only if it has a chance to precommit before the predictor makes its prediction (which could be before the agent is even created).  But an EDT agent one-boxes automatically, and pays in Counterfactual Mugging as long as it has a chance to precommit before it finds out whether the coin came up heads.  One of the first things we should expect a self-modifying EDT agent to do, is to make a blanket precommitment for all such problems.  That is, it self-modifies in such a way that the modification itself is "good news", regardless of whether the decisions it's precommitting to will be good or bad news when they are carried out.  This self-modification might be equivalent to designing something like an updateless decision theory agent.  The upshot, if you're a self-modifying AI designer, is that your AI can do this by itself, along with its other recursive self-improvements.

Ultimately, I think that causation is just a convenient short-hand that we use.  In practice, we infer causal relations by observing conditional probabilities.  Then we use those causal relations to inform our decisions.  It's a great heuristic, but we shouldn't lose sight of what we're actually trying to do, which is to choose the option such that the probability of a good outcome is highest.

## [Link] “Proxy measures, sunk costs, and Chesterton's fence”, or: the sunk cost heuristic

7 08 August 2012 02:39PM

Thought this post might be of interest to LW: Proxy measures, sunk costs, and Chesterton's fence. To summarize: Previous costs are a proxy measure for previous estimates of value, which may have information current estimates of value do not; therefore acting according to the sunk cost fallacy is not necessarily wrong.

This is not an entirely new idea here, but I liked the writeup. Previous discussion: Sunk Costs Fallacy Fallacy; Is Sunk Cost Fallacy a Fallacy?.

Excerpt:

If your evidence may be substantially incomplete you shouldn't just ignore sunk costs — they contain valuable information about decisions you or others made in the past, perhaps after much greater thought or access to evidence than that of which you are currently capable. Even more generally, you should be loss averse — you should tend to prefer avoiding losses over acquiring seemingly equivalent gains, and you should be divestiture averse (i.e. exhibit endowment effects) — you should tend to prefer what you already have to what you might trade it for — in both cases to the extent your ability to measure the value of the two items is incomplete. Since usually in the real world, and to an even greater degree in our ancestors' evolutionary environments, our ability to measure value is and was woefully incomplete, it should come as no surprise that people often value sunk costs, are loss averse, and exhibit endowment effects — and indeed under such circumstances of incomplete value measurement it hardly constitutes "fallacy" or "bias" to do so.

## The Doubling Box

13 06 August 2012 05:50AM

Let's say you have a box that has a token in it that can be redeemed for 1 utilon. Every day, its contents double. There is no limit on how many utilons you can buy with these tokens. You are immortal. It is sealed, and if you open it, it becomes an ordinary box. You get the tokens it has created, but the box does not double its contents anymore. There are no other ways to get utilons.

How long do you wait before opening it? If you never open it, you get nothing (you lose! Good day, sir or madam!) and whenever you take it, taking it one day later would have been twice as good.

I hope this doesn't sound like a reductio ad absurdum against unbounded utility functions or not discounting the future, because if it does you are in danger of amputating the wrong limb to save yourself from paradox-gangrene.

What if instead of growing exponentially without bound, it decays exponentially to the bound of your utility function? If your utility function is bounded at 10, what if the first day it is 5, the second 7.5, the third 8.75, etc. Assume all the little details, like remembering about the box, trading in the tokens, etc, are free.

If you discount the future using any function that doesn't ever hit 0, then the growth rate of the tokens can be chosen to more than make up for your discounting.

If it does hit 0 at time T, what if instead of doubling, it just increases by however many utilons will be adjusted to 1 by your discounting at that point every time of growth, but the intervals of growth shrink to nothing? You get an adjusted 1 utilon at time T - 1s, and another adjusted 1 utilon at T - 0.5s, and another at T - 0.25s, etc? Suppose you can think as fast as you want, and open the box at arbitrary speed. Also, that whatever solution your present self precommits to will be followed by the future self. (Their decision won't be changed by any change in what times they care about)

EDIT: People in the comments have suggested using a utility function that is both bounded and discounting. If your utility function isn't so strongly discounting that it drops to 0 right after the present, then you can find some time interval very close to the present where the discounting is all nonzero. And if it's nonzero, you can have a box that disappears, taking all possible utility with it at the end of that interval, and that, leading up to that interval, grows the utility in intervals that shrink to nothing as you approach the end of the interval, and increasing the utility-worth of tokens in the box such that it compensates for whatever your discounting function is exactly enough to asymptotically approach your bound.

Here is my solution. You can't assume that your future self will make the optimal decision, or even a good decision. You have to treat your future self as a physical object that your choices affect, and take the probability distribution of what decisions your future self will make, and how much utility they will net you into account.

Think if yourself as a Turing machine. If you do not halt and open the box, you lose and get nothing. No matter how complicated your brain, you have a finite number of states. You want to be a busy beaver and take the most possible time to halt, but still halt.

If, at the end, you say to yourself "I just counted to the highest number I could, counting once per day, and then made a small mark on my skin, and repeated, and when my skin was full of marks, that I was constantly refreshing to make sure they didn't go away...

...but I could let it double one more time, for more utility!"

If you return to a state you have already been at, you know you are going to be waiting forever and lose and get nothing. So it is in your best interest to open the box.

So there is not a universal optimal solution to this problem, but there is an optimal solution for a finite mind.

I remember reading a while ago about a paradox where you start with $1, and can trade that for a 50% chance of$2.01, which you can trade for a 25% chance of $4.03, which you can trade for a 12.5% chance of$8.07, etc (can't remember where I read it).

This is the same paradox with one of the traps for wannabe Captain Kirks (using dollars instead of utilons) removed and one of the unnecessary variables (uncertainty) cut out.

My solution also works on that. Every trade is analogous to a day waited to open the box.

## Thoughts on a possible solution to Pascal's Mugging

2 01 August 2012 12:32PM

For those who aren't familiar, Pascal's Mugging is a simple thought experiment that seems to demonstrate an intuitive flaw in naive expected utility maximization.  In the classic version, someone walks up to you on the street, and says, 'Hi, I'm an entity outside your current model of the universe with essentially unlimited capabilities.  If you don't give me five dollars, I'm going to use my powers to create 3^^^^3 people, and then torture them to death.'  (For those not familiar with Knuth up-arrow notation, see here).  The idea being that however small your probability is that the person is telling the truth, they can simply state a number that's grossly larger -  and when you shut up and multiply, expected utility calculations say you should give them the five dollars, along with pretty much anything else they ask for.

Intuitively, this is nonsense.  However, an AI under construction doesn't have a piece of code that lights up when exposed to nonsense.  Not unless we program one in.  And formalizing why, exactly, we shouldn't listen to the mugger is not as trivial as it sounds.  The actual underlying problem has to do with how we handle arbitrarily small probabilities.  There are a number of variations you could construct on the original problem that present the same paradoxical results.  There are also a number of simple hacks you could undertake that produce the correct results in this particular case, but these are worrying (not to mention unsatisfying) for a number of reasons.

So, with the background out of the way, let's move on to a potential approach to solving the problem which occurred to me about fifteen minutes ago while I was lying in bed with a bad case of insomnia at about five in the morning.  If it winds up being incoherent, I blame sleep deprivation.  If not, I take full credit.

Let's take a look at a new thought experiment.  Let's say someone comes up to you and tells you that they have magic powers, and will make a magic pony fall out of the sky.  Let's say that, through some bizarrely specific priors, you decide that the probability that they're telling the truth (and, therefore, the probability that a magic pony is about to fall from the sky) is exactly 1/2^100.  That's all well and good.

Now, let's say that later that day, someone comes up to you, and hands you a fair quarter and says that if you flip it one hundred times, the probability that you'll get a straight run of heads is 1/2^100.  You agree with them, chat about math for a bit, and then leave with their quarter.

I propose that the probability value in the second case, while superficially identical to the probability value in the first case, represents a fundamentally different kind of claim about reality than the first case.  In the first case, you believe, overwhelmingly, that a magic pony will not fall from the sky.  You believe, overwhelmingly, that the probability (in underlying reality, divorced from the map and its limitations) is zero.  It is only grudgingly that you inch even a tiny morsel of probability into the other hypothesis (that the universe is structured in such a way as to make the probability non-zero).

In the second case, you also believe, overwhelmingly, that you will not see the event in question (a run of heads).  However, you don't believe that the probability is zero.  You believe it's 1/2^100.  You believe that, through only the lawful operation of the universe that actually exists, you could be surprised, even if it's not likely.  You believe that if you ran the experiment in question enough times, you would probably, eventually, see a run of one hundred heads.  This is not true for the first case.  No matter how many times somebody pulls the pony trick, a rational agent is never going to get their hopes up.

I would like, at this point, to talk about the notion of metaconfidence.  When we talk to the crazy pony man, and to the woman with the coin, what we leave with are two identical numerical probabilities.  However, those numbers do not represent the sum total of the information at our disposal.  In the two cases, we have differing levels of confidence in our levels of confidence.  And, furthermore, this difference has an actual ramifications on what a rational agent should expect to observe.  In other words, even from a very conservative perspective, metaconfidence intervals pay rent.  By treating the two probabilities as identical, we are needlessly throwing away information.  I'm honestly not sure if this topic has been discussed before.  I am not up to date on the literature on the subject.  If the subject has already been thoroughly discussed, I apologize for the waste of time.

Disclaimer aside, I'd like to propose that we push this a step further, and say that metaconfidence should play a role in how we calculate expected utility.  If we have a very small probability of a large payoff (positive or negative), we should behave differently when metaconfidence is high than when it is low.

From a very superificial analysis, lying in bed, metaconfidence appears to be directional.  A low metaconfidence, in the case of the pony claim, should not increase the probability that the probability of a pony dropping out of the sky is HIGHER than our initial estimate.  It also works the other way as well: if we have a very high degree of confidence in some event (the sun rising tomorrow), and we get some very suspect evidence to the contrary (an ancient civilization predicting the end of the world tonight), and we update our probability downward slightly, our low metaconfidence should not make us believe that the sun is less likely to rise tomorrow than we thought.  Low metaconfidence should move our effective probability estimate against the direction of the evidence that we have low confidence in: the pony is less likely, and the sunrise is more likely, than a naive probability estimate would suggest.

So, if you have a claim like the pony claim (or Pascal's mugging), in which you have a very low estimated probability, and a very low metaconfidence, should become dramatically less likely to actually happen, in the real world, than a case in which we have a low estimated probability, but a very high confidence in that probability.  See the pony versus the coins.  Rationally, we can only mathematically justify so low a confidence in the crazy pony man's claims.  However, in the territory, you can add enough coins that the two probabilities are mathematically equal, and you are still more likely to get a run of heads than you are to have a pony magically drop out of the sky.  I am proposing metaconfidence weighting as a way to get around this issue, and allow our map to more accurately reflect the underlying territory.  It's not perfect, since metaconfidence is still, ultimately, calculated from our map of the territory, but it seems to me, based on my extremely brief analysis, that it is at least an improvement on the current model.

Essentially, this idea is based on the understanding that the numbers that we generate and call probability do not, in fact, correspond to the actual rules of the territory.  They are approximations, and they are perturbed by observation, and our finite data set limits the resolution of the probability intervals we can draw.  This causes systematic distortions at the extreme ends of the probability spectrum, and especially at the small end, where the scale of the distortion rises dramatically as a function of the actual probability.  I believe that the apparently absurd behavior demonstrated by an expected-utility agent exposed to Pascal's mugging, is a result of these distortions.  I am proposing we attempt to compensate by filling in the missing information at the extreme ends of the bell curve with data from our model about our sources of evidence, and about the underlying nature of the territory.  In other words, this is simply a way to use our available evidence more efficiently, and I suspect that, in practice, it eliminates many of the Pascal's-mugging-style problems we encounter currently.

I apologize for not having worked the math out completely.  I would like to reiterate that it is six thirty in the morning, and I've only been thinking about the subject for about a hundred minutes.  That said, I'm not likely to get any sleep either way, so I thought I'd jot the idea down and see what you folks thought.  Having outside eyes is very helpful, when you've just had a Brilliant New Idea.

View more: Next