Who's afraid of impossible worlds?
In order to clarify the semantics of paraconsistent and relevance logics, we need to make a detour into impossible worlds - a fruitful detour opening up fun new vistas. Note that this is an intuitive introduction to the subject, and logic is probably the area of mathematics where it is the most dangerous to rely on your intuition; this is no substitute for rigorously going through the formal definitions. With that proviso in mind, let's get cracking.
Possible worlds: the meaning of necessity
Modal logics were developed around the concept of necessity and possibility. They do this with two extra operators: the necessity operator □, and possibility operator ◊. The sentence □A is taken to mean "it is necessary that A" and ◊A means "it is possible that A". The two operators are dual to each other – thus "it is necessary that A" is the same as "it not possible that not-A" (in symbols: □A ↔ ¬◊¬A). A few intuitive axioms then lead to an elegant theory.
There was just one problem: early modal logicians didn't have a clue what they were talking about. They had the syntax, the symbols, the formal rules, but they didn't have the semantics, the models, the meanings of their symbols.
To see the trouble they had, imagine someone tossing a coin and covering it with their hand. Call wH the world in which it comes out heads and wT the world in which it comes out, you guessed it, tails. Now, is the coin necessarily heads? Is it possibly heads?
Intuitively, the answers should be no and yes. But this causes a problem. We may be in the world wH. So if we agree that the coin is not necessarily heads, then it is not necessarily heads even though it is actually heads (forget your Bayescraft here and start thinking like a logician). Similarly, in wT, the coin is in actuality tails yet it is possibly heads.
Saul Kripke found the breakthrough: necessity and possibility are not about individual worlds, but about collections of possible worlds, and relationships between them. In this case, there is a indistinguishability relationship between wT and wH, because we can't (currently) tell them apart.
Because of this relationship, the statement A:"the coin is heads" is possible in both wT and wH. The rule is that a statement is possible in world w if it is true in at least one world that is related to w. For w=wH, ◊A is true because A is true in wH and wH is related to itself. Similarly, for w=wT, ◊A is true because A is true in wH and wH is related to wT.
Conversely B:"the coin is heads or the coin is tails" is necessary in both wT and wH: here the rule is that a statement is necessary in world w if it is true in all worlds related to w. wT and wH are related only to each other through the indistinguishability relationship, and B is true in both of them, so □B is also true in both of them. However □A is not true in either wT or wH, because both those worlds are related to wT and A is false in wT.
Logic of Paradox: a (too) simple paraconsistent logic
The logic of paradox (LP) is the simplest, and one of the oldest, of the paraconsistent logics. Instead of assigning truths to statements A, it instead uses relationships binary relationships v(A,1) and v(A,0). "A is true" is encoded by v(A,1); "A is false" is similarly encoded by v(A,0). Each statement A is required to be true or false, but it can be both (in which case we could say that "A is undetermined"). "A is strictly true" means v(A,1) but not v(A,0); strict falsity is the converse.
The usual symbols →, ∨, ∧ and ¬, retain their standard meanings, and a compound statement takes all possible values it could take, seeing all the possible values its components could take. So, for instance if A is (strictly) true, then ¬A is (strictly) false. If A is undetermined, then ¬A is undetermined. If A is undetermined and B is strictly false, then A ∨ B is undetermined and A ∧ B is strictly false - though if B were strictly true, then A ∨ B would be strictly true and A ∧ B undetermined.
These properties make LP quite easy to work with, and one can determine the truths of many statements using truth tables. In fact, it can be seen that every tautology of classical logic is a tautology of LP. This derives from the fact that tautologies are true regardless of the truth values of their components; hence they remain true in LP whether we take undertermined statements to be true or false. Consequently, all of the following are true in LP:
- A → (B → A)
- (A ∧ ¬A) → B
- ((A ∨ B) ∧ ¬A) → B
- A ∧ (A → B) → B
But wait a second. Isn't the second line a statement of the principle of explosion - the fact that we can derive anything from a contradiction? Indeed it is. LP can state the principle of explosion as a (true) theorem - but it can't actually use it as a rule of deduction. Similarly, the third line is a statement of the disjunctive syllogism - a true theorem, but not a valid rule of deduction. That is easy to see: let A be undetermined, and B strictly false. Then A ∨ B is true, and so is ¬A - and yet we cannot deduce that B is true from this information.
So LP can accept contradictions without blowing up, has all the tautologies of classical logic, but lacks some of the rules of inference. "Some" of the rules of inference? LP even lacks modus ponens! As before, let A be undetermined and B strictly false; then A and (A → B) are both true, but B is not.
So while LP is a pleasant logic to play with, it isn't particularly useful. Another weakness is that is still defines the material conditional (A → B) as (¬A ∨ B): false statements still imply anything, and we haven't solved the Löbian problem for UDT. In the next post, I'll look at relevance logics, which have a more restricted use of →, and do allow modus ponens.
Paraconsistency and relevance: avoid logical explosions
EDIT: corrected from previous version.
If the moon is made of cheese, then Rafael Delago was elected president of Ecuador in 2005.
If you believe that Kennedy was shot in 1962, then you must believe that Santa Claus is the Egyptian god of the dead.
Both of these are perfectly sound arguments of classical logic. The premise is false, hence the argument is logically correct, no matter what the conclusion is: if A is false, then A→B is true.
It does feel counterintuitive, though, especially because human beliefs do not work in this way. Consider instead the much more intuitive statement:
If you believe that Kennedy was shot in 1962, then you must believe that Lee Harry Oswald was also shot in 1962.
Here there seems to be a connection between the two clauses; we feel A→B is more justified when "→" actually does some work in establishing a relationship between A and B. But can this intuition be formalised?
One way to do so is to use relevance logics, which are a subset of "paraconsistent" logics. Paraconsistent logics are those that avoid the principle of explosion. This is the rule in classical logic that if you accept one single contradiction - one single (A and not-A) - then you can prove everything. This is akin to accepting one false belief that contradict your other beliefs - after that, anything goes. The contradiction explodes and takes everything down with it. But why would we be interested in avoiding either the principle of explosion or unjustified uses of "→"?
An example of self-fulfilling spurious proofs in UDT
Benja Fallenstein was the first to point out that spurious proofs pose a problem for UDT. Vladimir Nesov and orthonormal asked for a formalization of that intuition. In this post I will give an example of a UDT-ish agent that fails due to having a malicious proof searcher, which feeds the agent a spurious but valid proof.
The basic idea is to have an agent A that receives a proof P as input, and checks P for validity. If P is a valid proof that a certain action a is best in the current situation, then A outputs a, otherwise A tries to solve the current situation by its own means. Here's a first naive formalization, where U is the world program that returns a utility value, A is the agent program that returns an action, and P is the proof given to A:
def U():
if A(P)==1:
return 5
else:
return 10
def A(P):
if P is a valid proof that A(P)==a implies U()==u, and A(P)!=a implies U()<=u:
return a
else:
do whatever
This formalization cannot work because a proof P can never be long enough to contain statements about A(P) inside itself. To fix that problem, let's introduce a function Q that generates the proof P:
def U():
if A(Q())==1:
return 5
else:
return 10
def A(P):
if P is a valid proof that A(Q())==a implies U()==u, and A(Q())!=a implies U()<=u:
return a
else:
do whatever
In this case it's possible to write a function Q that returns a proof that makes A return the suboptimal action 1, which leads to utility 5 instead of 10. Here's how:
Let X be the statement "A(Q())==1 implies U()==5, and A(Q())!=1 implies U()<=5". Let Q be the program that enumerates all possible proofs trying to find a proof of X, and returns that proof if found. (The definitions of X and Q are mutually quined.) If X is provable at all, then Q will find that proof, and X will become true (by inspection of U and A). That reasoning is formalizable in our proof system, so the statement "if X is provable, then X" is provable. Therefore, by Löb's theorem, X is provable. So Q will find a proof of X, and A will return 1.
One possible conclusion is that a UDT agent cannot use just any proof searcher or "mathematical intuition module" that's guaranteed to return valid mathematical arguments, because valid mathematical arguments can make the agent choose arbitrary actions. The proof searchers from some previous posts were well-behaved by construction, but not all of them are.
The troubling thing is that you may end up with a badly behaved proof searcher by accident. For example, consider a variation of U that adds some long and complicated computation to the "else" branch of U, before returning 10. That increases the length of the "natural" proof that a=2 is optimal, but the spurious proof for a=1 stays about the same length as it was, because the spurious proof can just ignore the "else" branch of U. This way the spurious proof can become much shorter than the natural proof. So if (for example) your math intuition module made the innocuous design decision of first looking at actions that are likely to have shorter proofs, you may end up with a spurious proof. And as a further plot twist, if we make U return 0 rather than 10 in the long-to-compute branch, you might choose the correct action due to a spurious proof instead of the natural one.
A Problem About Bargaining and Logical Uncertainty
Suppose you wake up as a paperclip maximizer. Omega says "I calculated the millionth digit of pi, and it's odd. If it had been even, I would have made the universe capable of producing either 1020 paperclips or 1010 staples, and given control of it to a staples maximizer. But since it was odd, I made the universe capable of producing 1010 paperclips or 1020 staples, and gave you control." You double check Omega's pi computation and your internal calculator gives the same answer.
Then a staples maximizer comes to you and says, "You should give me control of the universe, because before you knew the millionth digit of pi, you would have wanted to pre-commit to a deal where each of us would give the other control of the universe, since that gives you 1/2 probability of 1020 paperclips instead of 1/2 probability of 1010 paperclips."
Is the staples maximizer right? If so, the general principle seems to be that we should act as if we had precommited to a deal we would have made in ignorance of logical facts we actually possess. But how far are we supposed to push this? What deal would you have made if you didn't know that the first digit of pi was odd, or if you didn't know that 1+1=2?
On the other hand, suppose the staples maximizer is wrong. Does that mean you also shouldn't agree to exchange control of the universe before you knew the millionth digit of pi?
To make this more relevant to real life, consider two humans negotiating over the goal system of an AI they're jointly building. They have a lot of ignorance about the relevant logical facts, like how smart/powerful the AI will turn out to be and how efficient it will be in implementing each of their goals. They could negotiate a solution now in the form of a weighted average of their utility functions, but the weights they choose now will likely turn out to be "wrong" in full view of the relevant logical facts (e.g., the actual shape of the utility-possibility frontier). Or they could program their utility functions into the AI separately, and let the AI determine the weights later using some formal bargaining solution when it has more knowledge about the relevant logical facts. Which is the right thing to do? Or should they follow the staples maximizer's reasoning and bargain under the pretense that they know even less than they actually do?
Other Related Posts: Counterfactual Mugging and Logical Uncertainty, If you don't know the name of the game, just tell me what I mean to you
The limited predictor problem
This post requires some knowledge of logic, computability theory, and K-complexity. Much of the credit goes to Wei Dai. The four sections of the post can be read almost independently.
The limited predictor problem (LPP) is a version of Newcomb's Problem where the predictor has limited computing resources. To predict the agent's action, the predictor simulates the agent for N steps. If the agent doesn't finish in N steps, the predictor assumes that the agent will two-box. LPP is similar to the ASP problem, but with simulation instead of theorem proving.
1. Solving the problem when the agent has a halting oracle
Consider the agent defined in "A model of UDT with a halting oracle", and a predictor that can run the agent's code step-by-step, with oracle calls and all. Turns out that this agent solves LPP correctly if N is high enough. To understand why, note that the agent offloads all interesting work to oracles that return instantly, so the agent's own runtime is provably bounded. If that bound is below N, the agent's oracle will prove that the predictor predicts the agent correctly, so the agent will one-box.
2. Failing to solve the problem when N is algorithmically random
Consider a setting without oracles, with only Turing-computable programs. Maybe the agent should successively search for proofs somehow?
Unfortunately you can't solve most LPPs this way, for a simple but surprising reason. Assume that the predictor's time limit N is a large and algorithmically random number. Then the predictor's source code is >log(N) bits long, because N must be defined in the source code. Then any proof about the world program must also have length >log(N), because the proof needs to at least quote the world program itself. Finding a proof by exhaustive search takes exponential time, so the agent will need >N steps. But the predictor simulates the agent for only N steps. Whoops!
3. Solving the problem when N is large but has a short definition
As usual, let U be the world program that returns a utility value, and A be the agent program that returns an action and has access to the world's source code. Consider the following algorithm for A:
- From L=1 to infinity, search for proofs up to length L of the form "if A()=a and runtime(A)<g(L), then U()=u", where g(L) is an upper bound on runtime(A) if A stops the search at length L. Upon finding at least one proof for each possible a, go to step 2.
- Search for proofs up to length f(L) of the form "if runtime(A)<g(L), then A()≠a", where f(L) is some suitably fast-growing function like 10^L. If such a proof is found, return a.
- If we're still here, return the best a found on step 1.
This algorithm is very similar to the one described in "A model of UDT without proof limits", but with the added complication that A is aware of its own runtime via the function g(L). By an analogous argument, A will find the "intended" proof that the predictor predicts A correctly if runtime(A) is small enough, as long as the "intended" proof exists and isn't too long relative to the predictor's time limit N. More concretely, A will solve all instances of LPP in which N is larger than g(L), where L is the length of the "intended" proof. For example, if f(L)=10^L, then g(L) is doubly exponential, so A will successfully solve LPPs where the predictor's source code defines N using triple exponentials or some more compact notation.
4. A broader view
TDT and UDT were originally designed for solving "decision-determined" problems. The agent figures out how the resulting utility logically depends on the agent's action, then returns the action with the highest utility, thus making the premise true.
But a cleverly coded decision program can also control other facts about itself. For example, the program may figure out how the resulting utility depends on the program's return value and running time, then choose the best return value and choose how long to keep running, thus making both premises true. This idea is a natural extension of quining (you carefully write a program that can correctly judge its own runtime so far) and can be generalized to memory consumption and other properties of programs.
With enough cleverness we could write a program that would sometimes decide to waste time, or run for an even number of clock cycles, etc. We did not need so much cleverness in this post because LPP lies in a smaller class that we may call "LPP-like problems", where utility depends only on the agent's return value and runtime, and the dependence on runtime is monotonous - it never hurts to return the same value earlier. That class also includes all the usual decision-determined problems like Newcomb's Problem, and our A also fares well on those.
I was surprised to find so many new ideas by digging into such a trivial-looking problem as LPP. This makes me suspect that advanced problems like ASP may conceal even more riches, if only we have enough patience to approach them properly...
A model of UDT without proof limits
This post requires some knowledge of decision theory math. Part of the credit goes to Vladimir Nesov.
Let the universe be a computer program U that returns a utility value, and the agent is a subprogram A within U that knows the source code of both A and U. (The same setting was used in the reduction of "could" post.) Here's a very simple decision problem:
def U():
if A() == 1:
return 5
else:
return 10
The algorithm for A will be as follows:
- Search for proofs of statements of the form "A()=a implies U()=u". Upon finding at least one proof for each possible a, go to step 2.
- Let L be the maximum length of proofs found on step 1, and let f(L) be some suitably fast-growing function like 10^L. Search for proofs shorter than f(L) of the form "A()≠a". If such a proof is found, return a.
- If we're still here, return the best a found on step 1.
The usual problem with such proof-searching agents is that they might stumble upon "spurious" proofs, e.g. a proof that A()==2 implies U()==0. If A finds such a proof and returns 1 as a result, the statement A()==2 becomes false, and thus provably false under any formal system; and a false statement implies anything, making the original "spurious" proof correct. The reason for constructing A this particular way is to have a shot at proving that A won't stumble on a "spurious" proof before finding the "intended" ones. The proof goes as follows:
Assume that A finds a "spurious" proof on step 1, e.g. that A()=2 implies U()=0. We have a lower bound on L, the length of that proof: it's likely larger than the length of U's source code, because a proof needs to at least state what's being proved. Then in this simple case 10^L steps is clearly enough to also find the "intended" proof that A()=2 implies U()=10, which combined with the previous proof leads to a similarly short proof that A()≠2, so the agent returns 2. But that can't happen if A's proof system is sound, therefore A will find only "intended" proofs rather than "spurious" ones in the first place.
Quote from Nesov that explains what's going on:
With this algorithm, you're not just passively gauging the proof length, instead you take the first moral argument you come across, and then actively defend it against any close competition
By analogy we can see that A coded with f(L)=10^L will correctly solve all our simple problems like Newcomb's Problem, the symmetric Prisoner's Dilemma, etc. The proof of correctness will rely on the syntactic form of each problem, so the proof may break when you replace U with a logically equivalent program. But that's okay, because "logically equivalent" for programs simply means "returns the same value", and we don't want all world programs that return the same value to be decision-theoretically equivalent.
A will fail on problems where "spurious" proofs are exponentially shorter than "intended" proofs (or even shorter, if f(L) is chosen to grow faster). We can probably construct malicious examples of decision-determined problems that would make A fail, but I haven't found any yet.
Anthropic Reasoning by CDT in Newcomb's Problem
By orthonormal's suggestion, I take this out of comments.
Consider a CDT agent making a decision in a Newcomb's problem, in which Omega is known to make predictions by perfectly simulating the players. Assume further that the agent is capable of anthropic reasoning about simulations. Then, while making its decision, the agent will be uncertain about whether it is in the real world or in Omega's simulation, since the world would look the same to it either way.
The resulting problem has a structural similarity to the Absentminded driver problem1. Like in that problem, directly assigning probabilities to each of the two possibilities is incorrect. The planning-optimal decision, however, is readily available to CDT, and it is, naturally, to one-box.
Objection 1. This argument requires that Omega is known to make predictions by simulation, which is not necessarily the case.
Answer: It appears to be sufficient that the agent only knows that Omega is always correct. If this is the case, then a simulating-Omega and some-other-method-Omega are indistinguishable, so the agent can freely assume simulation.
[This is a rather shaky reasoning, I'm not sure it is correct in general. However, I hypothesise that whatever method Omega uses, if the CDT agent knows the method, it will one-box. It is only a "magical Omega" that throws CDT off.]
Objection 2. The argument does not work for the problems where Omega is not always correct, but correct with, say, 90% probability.
Answer: Such problems are underspecified, because it is unclear how the probability is calculated. [For example, Omega that always predicts "two-box" will be correct in 90% cases if 90% of agents in the population are two-boxers.] A "natural" way to complete the problem definition is to stipulate that there is no correlation between correctness of Omega's predictions and any property of the players. But this is equivalent to Omega first making a perfectly correct prediction, and then adding a 10% random noise. In this case, the CDT agent is again free to consider Omega a perfect simulator (with added noise), which again leads to one-boxing.
Objection 3. In order for the CDT agent to one-box, it needs a special "non-self-centered" utility function, which when inside the simulation would value things outside.
Answer: The agent in the simulation has exactly the same experiences as the agent outside, so it is the same self, so it values the Omega-offered utilons the same. This seems to be a general consequence of reasoning about simulations. Of course, it is possible to give the agent a special irrational simulation-fearing utility, but what would be the purpose?
Objection 4. CDT still won't cooperate in the Prisoner's Dilemma against a CDT agent with an orthogonal utility function.
Answer: damn.
1 Thanks to Will_Newsome for pointing me to this.
Is causal decision theory plus self-modification enough?
Occasionally a wrong idea still leads to the right outcome. We know that one-boxing on Newcomb's problem is the right thing to do. Timeless decision theory proposes to justify this action by saying: act as if you control all instances of your decision procedure, including the instance that Omega used to predict your behavior.
But it's simply not true that you control Omega's actions in the past. If Omega predicted that you will one-box and filled the boxes accordingly, that's because, at the time the prediction was made, you were already a person who would foreseeably one-box. One way to be such a person is to be a TDT agent. But another way is to be a quasi-CDT agent with a superstitious belief that greediness is punished and modesty is rewarded - so you one-box because two-boxing looks like it has the higher payoff!
That is an irrational belief, yet it still suffices to generate the better outcome. My thesis is that TDT is similarly based on an irrational premise. So what is actually going on? I now think that Newcomb's problem is simply an exceptional situation where there is an artificial incentive to employ something other than CDT, and that most such situations can be dealt with by being a CDT agent who can self-modify.
Eliezer's draft manuscript on TDT provides another example (page 20): a godlike entity - we could call it Alphabeta - demands that you choose according to "alphabetical decision theory", or face an evil outcome. In this case, the alternative to CDT that you are being encouraged to use is explicitly identified. In Newcomb's problem, no such specific demand is made, but the situation encourages you to make a particular decision - how you rationalize it doesn't matter.
We should fight the illusion that a TDT agent retrocausally controls Omega's choice. It doesn't. Omega's choice was controlled by the extrapolated dispositions of the TDT agent, as they were in the past. We don't need to replace CDT with TDT as our default decision theory, we just need to understand the exceptional situations in which it is expedient to replace CDT with something else. TDT will apply to some of those situations, but not all of them.
Predictability of Decisions and the Diagonal Method
This post collects a few situations where agents might want to make their decisions either predictable or unpredictable to certain methods of prediction, and considers a method of making a decision unpredictable by "diagonalizing" a hypothetical prediction of that decision. The last section takes a stab at applying this tool to the ASP problem.
The diagonal step
To start off, consider the halting problem, interpreted in terms of agents and predictors. Suppose that there is a Universal Predictor, an algorithm that is able to decide whether any given program halts or runs forever. Then, it's easy for a program (agent) to evade its gaze by including a diagonal step in its decision procedure: the agent checks (by simulation) if Universal Predictor comes to some decision about the agent, and if it does, the agent acts contrary to the Predictor's decision. This makes the prediction wrong, and Universal Predictors impossible.
The same trick could be performed against something that could exist, normal non-universal Predictors, which allows an agent to make itself immune to their predictions. In particular, ability of other agents to infer decisions of our agent may be thought of as prediction that an agent might want to hinder. This is possible so long as the predictors in question can be simulated in enough detail, that is it's known what they do (what they know) and our agent has enough computational resources to anticipate their hypothetical conclusions. (If an agent does perform the diagonal step with respect to other agents, the predictions of other agents don't necessarily become wrong, as they could be formally correct by construction, but they cease to be possible, which could mean that the predictions won't be made at all.)
Another problem with CDT, involving Bell's Theorem
Cavalcanti (2010) describes another problem with causal decision theory:
I apply some of the lessons from quantum theory, in particular from Bell’s theorem, to a debate on the foundations of decision theory and causation. By tracing a formal analogy between the basic assumptions of causal decision theory (CDT)—which was developed partly in response to Newcomb’s problem—and those of a local hidden variable theory in the context of quantum mechanics, I show that an agent who acts according to CDT and gives any nonzero credence to some possible causal interpretations underlying quantum phenomena should bet against quantum mechanics in some feasible game scenarios involving entangled systems, no matter what evidence they acquire. As a consequence, either the most accepted version of decision theory is wrong, or it provides a practical distinction, in terms of the prescribed behaviour of rational agents, between some metaphysical hypotheses regarding the causal structure underlying quantum mechanics.
Formulas of arithmetic that behave like decision agents
I wrote this post in the course of working through Vladimir Slepnev's A model of UDT with a halting oracle. This post contains some of the ideas of Slepnev's post, with all the proofs written out. The main formal difference is that while Slepnev's post is about programs with access to a halting oracle, the "decision agents" in this post are formulas in Peano arithmetic. They are generally uncomputable and do not reason under uncertainty.
These ideas are due to Vladimir Slepnev and Vladimir Nesov. (Please let me know if I should credit anyone else.) I'm pretty sure none of this is original material on my part. It is possible that I have misinterpreted Slepnev's post or introduced errors.
We are going to define a world function , a
-ary function1 that outputs an ordered pair
of payoff values. There are functions
such that
and
for any
. In fact
is a function in the three variables
and
.
We are also going to define an agent function that outputs the symbol
or
. The argument
is supposed to be the Gödel number of the world function, and
is some sort of indexical information.
We want to define our agent such that
( denotes the Gödel number of
.
means that
is provable in Peano arithmetic.
represents the numeral for
. I don't care what value
has when
isn't the Gödel number of an appropriate
-ary function.)
There is some circularity in this tentative definition, because a formula standing for appears in the definition of
itself. We get around this by using diagonalization. We'll describe how this works just this once: First define the function
as follows:
This function can be defined by a formula. Then the diagonal lemma gives us a formula such that
.
This is our (somewhat) rational decision agent. If it can prove it will do one thing, it does another; this is what Slepnev calls "playing chicken with the universe". If it can prove that is an optimal strategy, it chooses
; and otherwise it chooses
.
First, a lemma about the causes and consequences of playing chicken:
Lemma 1. For any ,
( is a binary-valued function such that
is true exactly when there is a proof of
in Peano arithmetic. For brevity we write
instead.
is the proposition that Peano arithmetic, plus the axiom that Peano arithmetic is consistent, is a consistent theory.)
Proof. (1) By definition of ,
So
(2) By the principle of explosion,
(3) By the definition of ,
(4)
If we assume consistency of (which entails consistency of
), then parts (1) and (3) of Lemma 1 tell us that for any
,
and
. So the agent never actually plays chicken.
Now let's see how our agent fares on a straightforward decision problem:
Proposition 2. Let and suppose
Assume consistency of . Then
if and only if
.
Proof. If we assume consistency of , then Lemma 1 tells us that the agent doesn't play chicken. So the agent will choose
if and only if it determines that choosing
is optimal.
We have
Suppose . Then clearly
So .
As for the converse: We have . If also
and , then
. By Lemma 1(3) and consistency of
, this cannot happen. So
Similarly, we have
for all . So
So the agent doesn't decide that is optimal, and
.
Now let's see how fares on a symmetric Prisoner's Dilemma with itself:
Proposition 3. Let
Then, assuming consistency of , we have
.
Proof.
(This proof uses Löb's theorem, and that makes it confusing. Vladimir Slepnev points out that Löb's theorem is not really necessary here; a simpler proof appears in the comments.)
Looking at the definition of , we see that
By Lemma 1, (1) and (3),
Similarly,
So
Applying Lemma 1(2) and (4),
By Löb's theorem,
By , we have
So, assuming , we conclude that
.
The definition of treats the choices
and
differently; so it is worth checking that
behaves correctly in the Prisoner's Dilemma when the effects of
and
are switched:
Proposition 4. Let
Then, assuming consistency of , we have
.
A proof appears in the comments.
There are a number of questions one can explore with this formalism: What is the correct generalization of that can choose between
actions, and not just two? How about infinitely many actions? What about theories other than Peano arithmetic? How do we accomodate payoffs that are real numbers? How do we make agents that can reason under uncertainty? How do we make agents that are computable algorithms rather than arithmetic formulas? How does
fare on a Prisoner's Dilemma with asymmetric payoff matrix? In a two-person game where the payoff to player
is independent of the behavior of
, can
deduce the behavior of
? What happens when we replace the third line of the definition of
with
? What is a (good) definition of "decision problem"? Is there a theorem that says that our agent is, in a certain sense, optimal?
1Every -ary function
in this article is defined by a formula
with
free variables such that
and
. By a standard abuse of notation, when the name of a function like
appears in a formula of arithmetic, what we really mean is the formula
that defines it.
[link] Innocentive challenge: $8000 for examples promoting altruistic behavior
A challenge recently posted on Innocentive seemed to me like something that may interest many LWers: "Models Motivating and Supporting Altruism Within Communities", with a grand prize of $8000. To quote from the challenge:
We are interested in looking at novel concepts from nature, business, or other areas that may elucidate the dynamics that help promote and maintain altruistic behaviors.
Further details are available on innocentive.com. I think that it would be a nice opportunity for our LW decision theory experts.
[For anybody who decides to participate: the links I provided contain a referral string so that, in case you win a prize, I can match your donation to the SIAI with the same fraction of my referral award ;) Please use them to register.]
In the Pareto world, liars prosper
This is a simple picture proof to show that if there is any decision process that will find a Pareto outcome for two people, it must be that liars will prosper: there are some circumstances where you would come out ahead if you were to lie about your utility function.
Apart from Pareto, the only other assumption it needs are that if the data is perfectly symmetric, then the outcome will be symmetric as well. We won't even need to use affine independence or other scalings of utility functions.
Now, given Pareto-optimality, symmetry allows us to solve symmetric problems by taking the unique symmetric Pareto option. Two such symmetric problems presented here, and in one of them, one of the two players must be able to prosper by lying.
So first assume Pareto-optimality, symmetry, and (by contradiction) that liars don't prosper. The players are x and y, and we will plot their utilities in the (x,y) plane. The first setup is presented in this figure:

Anthropic Decision Theory VI: Applying ADT to common anthropic problems
A near-final version of my Anthropic Decision Theory paper is available on the arXiv. Since anthropics problems have been discussed quite a bit on this list, I'll be presenting its arguments and results in this and previous posts 1 2 3 4 5 6.
Having presented ADT previously, I'll round off this mini-sequence by showing how it behaves with common anthropic problems, such as the Presumptuous Philosopher, Adam and Eve problem, and the Doomsday argument.
The Presumptuous Philosopher
The Presumptuous Philosopher was introduced by Nick Bostrom as a way of pointing out the absurdities in SIA. In the setup, the universe either has a trillion observers, or a trillion trillion trillion observers, and physics is indifferent as to which one is correct. Some physicists are preparing to do an experiment to determine the correct universe, until a presumptuous philosopher runs up to them, claiming that his SIA probability makes the larger one nearly certainly the correct one. In fact, he will accept bets at a trillion trillion to one odds that he is in the larger universe, repeatedly defying even strong experimental evidence with his SIA probability correction.
What does ADT have to say about this problem? Implicitly, when the problem is discussed, the philosopher is understood to be selfish towards any putative other copies of himself (similarly, Sleeping Beauty is often implicitly assumed to be selfless, which may explain the diverge of intuitions that people have on the two problems). Are there necessarily other similar copies? Well, in order to use SIA, the philosopher must believe that there is nothing blocking the creation of presumptuous philosophers in the larger universe; for if there was, the odds would shift away from the larger universe (in the extreme case when only one presumptuous philosopher is allowed in any universe, SIA finds them equi-probable). So the expected number of presumptuous philosophers in the larger universe is a trillion trillion times greater than the expected number in the small universe.
Anthropic Decision Theory V: Linking and ADT
A near-final version of my Anthropic Decision Theory paper is available on the arXiv. Since anthropics problems have been discussed quite a bit on this list, I'll be presenting its arguments and results in this, subsequent, and previous posts 1 2 3 4 5 6.
Now that we've seen what the 'correct' decision is for various Sleeping Beauty Problems, let's see a decision theory that reaches the same conclusions.
Linked decisions
Identical copies of Sleeping Beauty will make the same decision when faced with same situations (technically true until quantum and chaotic effects cause a divergence between them, but most decision processes will not be sensitive to random noise like this). Similarly, Sleeping Beauty and the random man on the street will make the same decision when confronted with a twenty pound note: they will pick it up. However, while we could say that the first situation is linked, the second is coincidental: were Sleeping Beauty to refrain from picking up the note, the man on the street would not so refrain, while her copy would.
The above statement brings up subtle issues of causality and counterfactuals, a deep philosophical debate. To sidestep it entirely, let us recast the problem in programming terms, seeing the agent's decision process as a deterministic algorithm. If agent α is an agent that follows an automated decision algorithm A, then if A knows its own source code (by quining for instance), it might have a line saying something like:
Module M: If B is another algorithm, belonging to agent β, identical with A ('yourself'), assume A and B will have identical outputs on identical inputs, and base your decision on this.
Anthropic Decision Theory IV: Solving Selfish and Average-Utilitarian Sleeping Beauty
A near-final version of my Anthropic Decision Theory paper is available on the arXiv. Since anthropics problems have been discussed quite a bit on this list, I'll be presenting its arguments and results in this, subsequent, and previous posts 1 2 3 4 5 6.
In the previous post, I looked at a decision problem when Sleeping Beauty was selfless or a (copy-)total utilitarian. Her behaviour was reminiscent of someone following SIA-type odds. Here I'll look at situations where her behaviour is SSA-like.
Altruistic average utilitarian Sleeping Beauty
In the incubator variant, consider the reasoning of an Outside/Total agent who is an average utilitarian (and there are no other agents in the universe apart from the Sleeping Beauties).
"If the various Sleeping Beauties decide to pay £x for the coupon, they will make -£x in the heads world. In the tails world, they will each make £(1-x) each, so an average of £(1-x). This give me an expected utility of £0.5(-x+(1-x))= £(0.5-x), so I would want them to buy the coupon for any price less than £0.5."
And this will then be the behaviour the agents will follow, by consistency. Thus they would be behaving as if they were following SSA odds, and putting equal probability on the heads versus tails world.
Anthropic Decision Theory III: Solving Selfless and Total Utilitarian Sleeping Beauty
A near-final version of my Anthropic Decision Theory paper is available on the arXiv. Since anthropics problems have been discussed quite a bit on this list, I'll be presenting its arguments and results in this, subsequent, and previous posts 1 2 3 4 5 6.
Consistency
In order to transform the Sleeping Beauty problem into a decision problem, assume that every time she is awoken, she is offered a coupon that pays out £1 if the coin fell tails. She must then decide at what cost she is willing to buy that coupon.
The very first axiom is that of temporal consistency. If your preferences are going to predictably change, then someone will be able to exploit this, by selling you something now that they will buy back for more later, or vice versa. This axiom is implicit in the independence axiom in the von Neumann-Morgenstern axioms of expected utility, where non-independent decisions show inconsistency after partially resolving one of the lotteries. For our purposes, we will define it as:
Anthropic Decision Theory II: Self-Indication, Self-Sampling and decisions
A near-final version of my Anthropic Decision Theory paper is available on the arXiv. Since anthropics problems have been discussed quite a bit on this list, I'll be presenting its arguments and results in this, subsequent, and previous posts 1 2 3 4 5 6.
In the last post, we saw the Sleeping Beauty problem, and the question was what probability a recently awoken or created Sleeping Beauty should give to the coin falling heads or tails and it being Monday or Tuesday when she is awakened (or whether she is in Room 1 or 2). There are two main schools of thought on this, the Self-Sampling Assumption and the Self-Indication Assumption, both of which give different probabilities for these events.
The Self-Sampling Assumption
The self-sampling assumption (SSA) relies on the insight that Sleeping Beauty, before being put to sleep on Sunday, expects that she will be awakened in future. Thus her awakening grants her no extra information, and she should continue to give the same credence to the coin flip being heads as she did before, namely 1/2.
In the case where the coin is tails, there will be two copies of Sleeping Beauty, one on Monday and one on Tuesday, and she will not be able to tell, upon awakening, which copy she is. She should assume that both are equally likely. This leads to SSA:
Anthropic decision theory I: Sleeping beauty and selflessness
A near-final version of my Anthropic Decision Theory paper is available on the arXiv. Since anthropics problems have been discussed quite a bit on this list, I'll be presenting its arguments and results in this and subsequent posts 1 2 3 4 5 6.
Many thanks to Nick Bostrom, Wei Dai, Anders Sandberg, Katja Grace, Carl Shulman, Toby Ord, Anna Salamon, Owen Cotton-barratt, and Eliezer Yudkowsky.
The Sleeping Beauty problem, and the incubator variant
The Sleeping Beauty problem is a major one in anthropics, and my paper establishes anthropic decision theory (ADT) by a careful analysis it. Therefore we should start with an explanation of what it is.
In the standard setup, Sleeping Beauty is put to sleep on Sunday, and awoken again Monday morning, without being told what day it is. She is put to sleep again at the end of the day. A fair coin was tossed before the experiment began. If that coin showed heads, she is never reawakened. If the coin showed tails, she is fed a one-day amnesia potion (so that she does not remember being awake on Monday) and is reawakened on Tuesday, again without being told what day it is. At the end of Tuesday, she is put to sleep for ever. This is illustrated in the next figure:
Rival formalizations of a decision problem
Decision theory is not one of my strengths, and I have a question about it.
Is there a consensus view on how to deal with the problem of "rival formalizations"? Peterson (2009) illustrates the problem like this:
Imagine that you are a paparazzi photographer and that rumour has it that actress Julia Roberts will show up in either New York (NY), Los Angeles (LA) or Paris (P). Nothing is known about the probability of these states of the world. You have to decide if you should stay in America or catch a plane to Paris. If you stay and [she] shows up in Paris you get $0; otherwise you get your photos, which you will be able to sell for $10,000. If you catch a plane to Paris and Julia Roberts shows up in Paris your net gain after having paid for the ticket is $5,000, and if she shows up in America you for some reason, never mind why, get $6,000. Your initial representation of the decision problem is visualized in Table 2.13.
Table 2.13
| P | LA | NY | |
| Stay | $0 | $10k | $10k |
| Go to Paris | $5k | $6k | $6k |
Since nothing is known about the probabilities of the states in Table 2.13, you decide it makes sense to regard them as equally probable [see Table 2.14].
Table 2.14
| P (1/3) | LA (1/3) | NY (1/3) | |
| Stay | $0 | $10k | $10k |
| Go to Paris | $5k | $6k | $6k |
The rightmost columns are exactly parallel. Therefore, they can be merged into a single (disjuntive) column, by adding the probabilities of the two rightmost columns together (Table 2.15).
Table 2.15
| P (1/3) | LA or NY (2/3) | |
| Stay | $0 | $10k |
| Go to Paris | $5k | $6k |
However, now suppose that you instead start with Table 2.13 and first merge the two repetitious states into a single state. You would then obtain the decision matrix in Table 2.16.
Table 2.16
| P | LA or NY | |
| Stay | $0 | $10k |
| Go to Paris | $5k | $6k |
Now, since you know nothing about the probabilities of the two states, you decide to regard them as equally probable... This yields the formal representation in Table 2.17, which is clearly different from the one suggested above in Table 2.15.
Table 2.17
| P (1/2) | LA or NY (1/2) | |
| Stay | $0 | $10k |
| Go to Paris | $5k | $6k |
Which formalisation is best, 2.15 or 2.17? It seems question begging to claim that one of them must be better than the other — so perhaps they are equally reasonable? If they are, we have an example of rival formalisations.
Note that the principle of maximising expected value recommends different acts in the two matrices. According to Table 2.15 you should stay, but 2.17 suggests you should go to Paris.
Does anyone know how to solve this problem? If one is not convinced by the illustration above, Peterson (2009) offers a proof that rival representations are possible on pages 33–35.
'Decision-theoretic paradoxes as voting paradoxes'
Briggs (2010) may be of interest to LWers. Opening:
It is a platitude among decision theorists that agents should choose their actions so as to maximize expected value. But exactly how to define expected value is contentious. Evidential decision theory (henceforth EDT), causal decision theory (henceforth CDT), and a theory proposed by Ralph Wedgwood that I will call benchmark theory (BT) all advise agents to maximize different types of expected value. Consequently, their verdicts sometimes conflict. In certain famous cases of conflict — medical Newcomb problems — CDT and BT seem to get things right, while EDT seems to get things wrong. In other cases of conflict, including some recent examples suggested by Egan 2007, EDT and BT seem to get things right, while CDT seems to get things wrong. In still other cases, EDT and CDT seem to get things right, while BT gets things wrong.
It’s no accident, I claim, that all three decision theories are subject to counterexamples. Decision rules can be reinterpreted as voting rules, where the voters are the agent’s possible future selves. The problematic examples have the structure of voting paradoxes. Just as voting paradoxes show that no voting rule can do everything we want, decision-theoretic paradoxes show that no decision rule can do everything we want. Luckily, the so-called “tickle defense” establishes that EDT, CDT, and BT will do everything we want in a wide range of situations. Most decision situations, I argue, are analogues of voting situations in which the voters unanimously adopt the same set of preferences. In such situations, all plausible voting rules and all plausible decision rules agree.
[Link] 20 2020 Pennies (a webcomic chapter about many worlds and decision theory... sort of)
The comic in question (Penny & Aggie; by T Campbell) is as a whole a simple teenage comedy/drama. But the particular storyline I'd like to discuss here takes a much more SF turn than usual, and it's (marginally; if we stretch the concepts a bit) related to issues relevant to LessWrong; decision theory, CEV, perhaps even simulations and/or many-worlds.
The needed context is that in the page immediately previous, one of the comic's two protagonists (Penny) is asked by her biker boyfriend Rich to follow him on the road, effectively dropping out of highschool.
The chapter itself is about 20 different future Pennies from the year 2020 (20 that represent trillions), convene to decide which choice to take.

Thoughts and SPOILERS for the story to follow after the space, so you may want to read it before proceeding.
Perhaps the best way one can handle this whole bizarreness would be as a visualization of the FAI-failure mode in which the AI's models of people are also people. So that the AI can only anticipate what people would want to do or would regret doing, if he has their simulations actively decide to do it, and then regret it. But for the purposes of the convention, the AI disabled all self-preservation circuitry, so that these models can vote with full honesty the decision they believe best.

To put it in LessWrong terms: "Up yours, Extrapolated Volition".
Most intriguingly yet, at least one of those extrapolated versions (Biker Penny who voted against joining Rich and bitterly regretted joining a "clique for losers") actually seems to admire and love how Teenage Penny is telling her to go to hell: What if your extrapolated volition is a volition that doesn't wish you to consider the rulings of your extrapolated volition?
Also (an even more complicated scenario) what if your current volition wishes you to follow your extrapolated volition, but your extrapolated volition would want you to follow a different decision path (don't consider the future)? What ways are there outside of this paradox? What decision do you take, if you are changed by that decision into a person that will regret it either way for different reasons?
As I said, the rest of the comic is however mostly teenage comedy/drama, though it does include some amusing SF references/tropes from time to time.
How to (un)become a crank
Ahhh, a human interest post. Well, sort of. At least it has something besides math-talk.
In the extreme programming community they have a saying, "three strikes and you refactor". The rationalist counterpart would be this: once you've noticed the same trap twice, you'd be stupid to fall prey to it the third time.
Strike one is Eliezer's post The Crackpot Offer. Child-Eliezer thought he'd overthrown Cantor's theorem, then found an error in his reasoning, but felt a little tempted to keep on trying to overthrow the damned theorem anyway. The right and Bayesian thing to do, which he ended up doing, was to notice that once you've found your mistake there's no longer any reason to wage war on an established result.
Strike two is Emile's comment on one of my recent posts:
I find it annoying how my brain keeps saying "hah, I bet I could" even though I explained to it that it's mathematically provable that such an input always exists. It still keeps coming up with "how about this clever encoding?, blablabla" ... I guess that's how you get cranks.
Strike three is... I'm a bit ashamed to say that...
...strike three is about me. And maybe not only me.
There's a certain vibe in the air surrounding many discussions of decision theory. It sings: maybe the central insight of game theory (that multiplayer situations are not reducible to single-player ones) is wrong. Maybe the slightly-asymmetrized Prisoner's Dilemma has a single right answer. Maybe you can get a unique solution to dividing a cake by majority vote if each individual player's reasoning is "correct enough". But honestly, where exactly is the Bayesian evidence that merits anticipating success on that path? Am I waging war on clear and simple established results because of wishful thinking? Are my efforts the moral equivalent of counting the reals or proving the consistency of PA within PA?
An easy answer is that "we don't know" if our inquiries will be fruitful, so you can't prove I must stop. But that's not the Bayesian answer. The Bayesian answer is to honestly tally up the indications that future success is likely, and stop if they are lacking.
So I want to ask an object-level question and a meta-level question:
1) What evidence supports the intuition that, contra game theory, single-player decision theory has a "solution"?
2) If there's not much evidence supporting that intuition, how should I change my actions?
(I already have tentative answers to both questions, but am curious what others think. Note that you can answer the second question without knowing any math :-))
Example decision theory problem: "Agent simulates predictor"
Some people on LW have expressed interest in what's happening on the decision-theory-workshop mailing list. Here's an example of the kind of work we're trying to do there.
In April 2010 Gary Drescher proposed the "Agent simulates predictor" problem, or ASP, that shows how agents with lots of computational power sometimes fare worse than agents with limited resources. I'm posting it here with his permission:
There's a version of Newcomb's Problem that poses the same sort of challenge to UDT that comes up in some multi-agent/game-theoretic scenarios.
Suppose:
- The predictor does not run a detailed simulation of the agent, but relies instead on a high-level understanding of the agent's decision theory and computational power.
- The agent runs UDT, and has the ability to fully simulate the predictor.
Since the agent can deduce (by low-level simulation) what the predictor will do, the agent does not regard the prediction outcome as contingent on the agent's computation. Instead, either predict-onebox or predict-twobox has a probability of 1 (since one or the other of those is deducible), and a probability of 1 remains the same regardless of what we condition on. The agent will then calculate greater utility for two-boxing than for one-boxing.
Meanwhile, the predictor, knowing that the the agent runs UDT and will fully simulate the predictor, can reason as in the preceding paragraph, and thus deduce that the agent will two-box. So the large box is left empty and the agent two-boxes (and the agent's detailed simulation of the predictor correctly shows the predictor correctly predicting two-boxing).
The agent would be better off, though, running a different decision theory that does not two-box here, and that the predictor can deduce does not two-box.
About a month ago I came up with a way to formalize the problem, along the lines of my other formalizations:
a) The agent generates all proofs of length up to M, then picks the action for which the greatest utility was proven.
b) The predictor generates all proofs of length up to N which is much less than M. If it finds a provable prediction about the agent's action, it fills the boxes accordingly. Also the predictor has an "epistemic advantage" over the agent: its proof system has an axiom saying the agent's proof system is consistent.
Now the predictor can reason as follows. It knows that the agent will find some proof that the predictor will put X dollars in the second box, for some unknown value of X, because the agent has enough time to simulate the predictor. Therefore, it knows that the agent will find proofs that one-boxing leads to X dollars and two-boxing leads to X+1000 dollars. Now what if the agent still chooses one-boxing in the end? That means it must have found a different proof saying one-boxing gives more than X+1000 dollars. But if the agent actually one-boxes, the existence of these two different proofs would imply that the agent's proof system is inconsistent, which the predictor knows to be impossible. So the predictor ends up predicting that the agent will two-box, the agent two-boxes, and everybody loses.
Also Wei Dai has a tentative new decision theory that solves the problem, but this margin (and my brain) is too small to contain it :-)
Can LW generate the kind of insights needed to make progress on problems like ASP? Or should we keep working as a small clique?
The Difference Between Classical, Evidential, and Timeless Decision Theories
I couldn't find any concise explanation of what the decision theories are. Here's mine:
A Causal Decision Theorist wins, given what's happened so far.
An Evidential Decision Theorist wins, given what they know.
A Timeless Decision Theorist wins a priori.
To explain what I mean, here are two interesting problems. In each of them, two of the decision theories give one choice, and the third gives the other.
In Newcomb's problem and you separate people into groups based on what happened before the experiment, i.e. whether or not Box A has money, CDT will be at least as successful in each group as any other strategy, and notably more successful than EDT and TDT. If you separate it into what's known, there's only one group, since everybody has the same information. EDT is at least as successful as any other strategy, and notably more successful than CDT. If you don't separate it at all, TDT will be at least as successful as any other strategy, and notably more successful than EDT.
In Parfit's hitchhiker, when it comes time to pay the driver, if you split into groups based on what happened before the experiment, i.e. whether or not one has been picked up, CDT will be at least as successful in each group as any other strategy, and notably more successful than TDT. If you split based on what's given, which is again whether or not one has been picked up, EDT will be at least as successful in each group as any other strategy, and notably more successful than TDT. If you don't separate at all, TDT will be at least as successful as any other strategy, and notably more successful than CDT and EDT.
There's one thing I'm not sure about. How does Updateless Decision Theory compare?
Revisiting the anthropic trilemma III: solutions and interpretations
In previous posts, I revisited Eliezer's anthropic trilemma, approaching it with ata's perspective that the decisions made are the objects of fundamental interest, not the probabilities or processes that gave rise to them. I initially applied my naive intuitions to the problem, and got nonsense. I then constructed a small collection of reasonable-seeming assumptions, and showed they defined a single method of spreading utility functions across copies.
This post will apply that method to the anthropic trilemma, and thus give us the "right" decisions to make. I'll then try and interpret these decisions, and see what they tell us about subjective anticipation, probabilities and the impact of decisions. As in the original post, I will be using the chocolate bar as the unit of indexical utility, as it is a well known fact that everyone's utility is linear in chocolate.
The details of the lottery winning setup can be found either here or here. The decisions I must make are:
Would I give up a chocolate bar now for two to be given to one of the copies if I win the lottery? No, this loses me one utility and gains me only 2/million.
Would I give up a chocolate bar now for two to given to every copy if I win the lottery? Yes, this loses me one utility and gains me 2*trillion/million = 2 million.
Would I give up one chocolate bar now, for two chocolate bars to the future merged me if I win the lottery? No, this gives me an expected utility of -1+2/million.
Now let it be after the lottery draw, after the possible duplication, but before I know whether I've won the lottery or not. Would I give up one chocolate bar now in exchange for two for me, if I had won the lottery (assume this deal is offered to everyone)? The SIA odds say that I should; I have an expected gain of 1999/1001 ≈ 2.
Now assume that I have been told I've won the lottery, so I'm one of the trillion duplicates. Would I give up a chocolate bar for the future merged copy having two? Yes, I would, the utility gain is 2-1=1.
So those are the decisions; how to interpret them? There are several ways of doing this. There are four things to keep in mind: probability, decision impact, utility function, and subjective anticipation.
Sleeping anti-beauty and the presumptuous philosopher
My approach for dividing utility between copies gives the usual and expected solutions to the sleeping beauty problem: if all copies are offered bets, take 1/3 odds, if only one copy is offered bets, take 1/2 odds.
This makes sense, because my approach is analogous to "some future version of Sleeping Beauty gets to keep all the profits".
The presumptuous philosopher problem is subtly different from the sleeping beauty problem. It can best be phrased as sleeping beauty problem where each copy doesn't care for any other copy. Solving this is a bit more subtle, but an useful half-way point is the "Sleeping Anti-Beauty" problem.
Here, as before, one or two copies are created depending on the result of a coin flip. However, if two copies are created, they are the reverse of mutually altruistic: they derive disutility from the other copy achieving its utility. So if both copies receive $1, neither of their utilities increase: they are happy to have the cash, but angry the other copy also has cash.
Apart from this difference in indexical utility, the two copies are identical, and will reach the same decision. Now, as before, every copy is approached with bets on whether they are in the large universe (with two copies) or the small one (with a single copy). Using standard UDT/TDT Newcomb-problem type reasoning, they will always take the small universe side in any bet (as any gain/loss in the large universe is compensated for by the same gain/loss for the other copy they dislike).
Now, you could model the presumptuous philosopher by saying they have 50% chance of being in a Sleeping-Beauty (SB) situation and 50% of being in a Sleeping Anti-Beauty (SAB) situation (indifference modelled as half way between altruism and hate).
There are 4 equally likely possibilities here: small universe in SB, large universe in SB, small universe in SAB, large universe in SAB. A contract that gives $1 in a small universe is worth 0.25 + 0 + 0.25 + 0 = $0.5. While a contract that gives $1 in a large universe is worth 0 + 0.25*2 + 0 + 0 = $0.5 (as long as its offered to everyone). So it seems that a presumptuous philosopher should take even odds on the size of the universe if he doesn't care about the other presumptuous philosophers.
It's no coincidence this result can be reached by UDT-like arguments such as "take the objective probabilities of the universes, and consider the total impact of your decision being X, including all other decision that must be the same as yours". I'm hoping to find more fundamental reasons to justify this approach soon.
Subjective anticipation as a decision process
As argued here, debates about probability can be profitably replaced with decision problems. This often dissolves the debate - there is far more agreement as to what decision sleeping beauty should take than on what probabilities she should use.
The concept of subjective anticipation or subjective probabilities that cause such difficulty here, can, I argue, be similarly replaced by a simple decision problem.
If you are going to be copied, uncopied, merged, killed, propagated through quantum branches, have your brain tasered with amnesia pills while your parents are busy flipping coins before deciding to reproduce, and are hence unsure as to whether you should subjectively anticipated being you at a certain point, the relevant question should not be whether you feel vaguely connected to the putative future you in some ethereal sense.
Instead the question should be akin to: how many chocolate bars would your putative future self have to be offered, for you to forgo one now? What is the tradeoff between your utilities?
Now, altruism is of course a problem for this approach: you might just be very generous with copy #17 down the hallway, he's a thoroughly decent chap and all that, rather than anticipating being him. But humans can generally distinguish between selfish and altruistic decisions, and the setup can be tweaked to encourage the maximum urges towards winning, rather than letting others win. For me, a competitive game with chocolate as the reward would do the trick...
Unlike for the sleeping beauty problem, this rephrasing does not instantly solve the problems, but it does locate them: subjective anticipation is encoded in the utility function. Indeed, I'd argue that subjective anticipation is the same problem as indexical utility, with a temporal twist thrown in.
Three easy anthropic models, and two hard ones
For illustrative purposes, imagine simple agents - AI's, or standard utility maximisers - who have to make decisions under anthropic uncertainty.
Specifically, let there be two worlds, W1 and W2, equally likely to exist. W1 contains one copy of the agent, W2 contains two copies. The agent has one single action available: the opportunity to create, once, either a box or a cross. The utility of doing so varies depending on which world the agent is in, as follows:
In W1: Utility(cross) = 2, Utility(box) = 5
In W2: Utility(cross) = 2, Utility(box) = 0
The agent has no extra way of telling which world they are in.
- First model (aggregationist, non-indexical):
Each box or cross created will generate the utility defined above, and the utility is simply additive. Then if the agent decides to generate crosses, the expected utility is 0.5(2+(2+2))=3, while that of generating boxes is 0.5(5+(0+0))=2.5. Generating crosses is the way to go.
- Second model (non-aggregationist, non-indexical):
The existence of a single box or cross will generate the utility defined above, but extra copies won't change anything. Then if the agent decides to generate crosses, the expected utility is 0.5(2+2)=2, while that of generating boxes is 0.5(5+0)=2.5. Generating boxes is the way to go.
- Third model (unlikely existence, non-aggregationist, non-indexical):
Here a simple change is made: the worlds do not contain agents, but proto-agents, each of which has an (independent) one chance in a million of becoming an agent. Hence the probability of the agent existing in the first universe is 1/M, while the probability of an agent existing in the second universe is approximately 2/M. The expected utility of crosses is approximately 1/M*0.5(2+2*2)=3/M while that of boxes is approximately 1/M*0.5(5+2*0)=2.5/M. Generating crosses is the way to go.
- Fourth model (indexical):
This is the first "hard" model from the title. Here the agent only derives utility from the box or cross it generated itself. And here, things get interesting.
There is no immediately obvious way of solving this situation, so I tried replacing it with a model that seems equivalent. Instead of having indexical preferences for its own shapes, I'll give the agent non-indexical aggregationist preferences (just as in the first model), and halve the utility of any shape in W2. This should give the same utility to all agents in all possible worlds as the indexical model. Under the new model, the utility of crosses is 0.5(2+(1+1)) = 2, while that of boxes is 0.5(5+(0+0))=2.5. Boxes are the way to go.
- Fifth model (indexical, anticipated experience)
The fifth model is one where after the agents in W2 have made their decision, but before they implement it, one of them is randomly deleted, and the survivor creates two shapes. If the agents are non-indexical, then the problem is simply a version of first model.
But now the agents are indexical. There are two ways of capturing this fact; either the agent can care about the fact that "I, myself will have created a shape", or "the thread of my future experience will contain an agent that will have created a shape". In the first case, the agent should consider that in W2, it only has a 50% chance of succeeding in it's goal, but the weight of its goal is doubled: this is the fourth model again, hence: boxes.
In the second case, each agent should consider the surviving agent as the thread of its future experience. This is equivalent to non-indexical first case, where only the number of shapes matter (since all future shapes belong to an agent that is in the current agent(s)' future thread of experience). Hence: crosses.
I won't be analysing solutions to these problems yet, but simply say that many solutions will work, such as SIA with a dictator's filter. However, though the calculations are correct, the intuition behind this seems suspect in the fourth model, and one could achieve similar results without SIA at all (giving its decision the power to affect multiple agent outcomes at once, for instance).
It should be noted that the fourth model seems to imply the Presumptuous Philosopher would be wrong to accept his bets. However, the third model seems to imply the truth of FNC (full non-indexical conditioning), which is very close to SIA - but time inconsistent. And there, the Presumptuous Philosopher would be right to accept his bets.
Confusion still persists in my mind, but I think it's moving towards a resolution.
Social Presuppositions
During discussion in my previous post, when we touched the subject of human statistical majorities, I had a side-thought. If taking the Less Wrong audience as an example, the statistics say that any given participant is strongly likely to be white, male, atheist, and well, just going by general human statistics, probably heterosexual.
But in my actual interaction, I've taken as a rule not to make any assumptions about the other person. Does it mean, I thought, that I reset my prior probabilities, and consciously choose to discard information? Not relying on implicit assumptions seems the socially right thing to do, I thought; but is it rational?
When I discussed it on IRC, this quote by sh struck me as insightful:
I.e. making the guess incorrectly probably causes far more friction than deliberately not making a correct guess you could make.
I came up with the following payoff matrix:
| Bob | |||
| Has trait X (p = 0.95) | Doesn't have trait X (p = 0.05) | ||
| Alice | Acts as if Bob has trait X | +1 | -100 |
| Acts without assumptions about Bob | 0 | 0 | |
In this case, the second option is strictly preferable. In other words, I don't discard the information, but the repercussions to our social interaction in case of an incorrect guess outweigh the benefit from guessing correctly. And it also matters whether either Alice or Bob is an Asker or a Guesser.
One consequence I can think of is that with a sufficiently low p, or if Bob wouldn't be particularly offended by Alice's incorrect guess, taking the guess would be preferable. Now I wonder if we do that a lot in daily life with issues we don't consider controversial ("hmm, are you from my country/state too?"), and if all the "you're overreacting/too sensitive" complaints come from Alice incorrectly assessing a too low-by-absolute-value negative payoff in (0, 1).
Agents of No Moral Value: Constrained Cognition?
Thought experiments involving multiple agents usually postulate that the agents have no moral value, so that the explicitly specified payoff from the choice of actions can be considered in isolation, as both the sole reason and evaluation criterion for agents' decisions. But is that really possible to require from an opposing agent to have no moral value, without constraining what it's allowed to think about?
If agent B is not a person, how do we know it can't decide to become a person for the sole reason of gaming the problem, manipulating agent A (since B doesn't care about personhood, so it costs B nothing, but A does)? If it's stipulated as part of the problem statement, it seems that B's cognition is restricted, and the most rational course of action is prohibited from being considered for no within-thought-experiment reason accessible to B.
It's not enough to require that the other agent is inhuman in the sense of not being a person and not holding human values, as our agent must also not care about the other agent. And once both agents don't care about each other's cognition, the requirement for them not being persons or valuable becomes extraneous.
Thus, instead of requiring that the other agent is not a person, the correct way of setting up the problem is to require that our agent is indifferent to whether the other agent is a person (and conversely).
(It's not a very substantive observation I would've posted with less polish in an open thread if not for the discussion section.)
Evidential Decision Theory and Mass Mind Control
Required Reading: Evidential Decision Theory
Let me begin with something similar to Newcomb's Paradox. You're not the guy choosing whether or not to take both boxes. You're the guy who predicts. You're not actually prescient. You can only make an educated guess.
You watch the first person play. Let's say they pick one box. You know they're not an ordinary person. They're a lot more philosophical than normal. But that doesn't mean that the knowledge of what they choose is completely useless later on. The later people might be just as weird. Or they might be normal, but they're not completely independent of this outlier. You can use his decision to help predict theirs, if only by a little. What's more, this still works if you're reading through archives and trying to "predict" the decisions people have already made in earlier trials.
The decision of the player choosing the box affects whether or not the predictor will predict that later, or earlier, people will take the box. According to EDT, one should act in the way that results in the most evidence for what one wants. Since the predictor is completely rational, this means that the player choosing the box effectively changes decisions other people make, or actually changes depending on your interpretation of EDT. One can even affect people's decisions in the past, provided that one doesn't know what they were.
In short, the decisions you make affect the decisions other people will make and have made. I'm not sure how much, but there have probably been 50 to 100 billion people. And that's not including the people who haven't been born yet. Even if you only change one in a thousand decisions, that's at least 50 million people.
Like I said: mass mind control. Use this power for good.
Counterfactual mugging: alien abduction edition
Omega kidnapps you and an alien from FarFarAway Prime, and gives you the choice: either the alien dies and you go home with your memory wiped, or you lose an arm, and you both go home with your memories wiped. Nobody gets to remember this. Oh and Omega flipped a coin to see who got to choose. What is your choice?
As usual, Omega is perfectly reliable, isn't hiding anything, and goes away afterwards. You also have no idea what the alien's values are, where it lives, what it would choose, nor what is the purpose of that organ that pulsates green light.
(This is my (incorrect) interpretation of counterfactual mugging, which we were discussing on the #lesswrong channel; Boxo pointed out that it's Prisonner's Dilemma where a random player is forced to cooperate, and isn't that similar to counterfactual mugging.)
View more: Prev
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)