Counterfactual Mugging Alternative
Edit as of June 13th, 2016: I no longer believe this to be easier to understand than traditional CM, but stand by the rest of it. Minor aesthetic edits made.
First post on the LW discussion board. Not sure if something like this has already been written, need your feedback to let me know if I’m doing something wrong or breaking useful conventions.
An alternative to the counterfactual mugging, since people often require it explained a few times before they understand it -- this one I think will be faster for most to comprehend because it arose organically, not seeming specifically contrived to create a dilemma between decision theories:
Pretend you live in a world where time travel exists and Time can create realities with acausal loops, or of ordinary linear chronology, or another structure, so long as there is no paradox -- only self-consistent timelines can be generated.
In your timeline, there are prophets. A prophet (known to you to be honest and truly prophetic) tells you that you will commit an act which seems horrendously imprudent or problematic. It is an act whose effect will be on the scale of losing $10,000; an act you never would have taken ordinarily. But fight the prophecy all you want, it is self-fulfilling and you definitely live in a timeline where the act gets committed. However, if it weren’t for the prophecy being immutably correct, you could have spent $100 and, even having heard the prophecy (even having believed it would be immutable) the probability of you taking that action would be reduced by, say, 50%. So fighting the prophecy by spending $100 would mean that there were 50% fewer self-consistent (possible) worlds where you lost the $10,000, because its just much less likely for you to end up taking that action if you fight it rather than succumbing to it.
You may feel that there would be no reason to spend $100 averting a decision that you know you’re going to make, and see no reason to care about counterfactual worlds where you don’t lose the $10,000. But the fact of the matter is that if you could have precommitted to fight the choice you would have, because in the worlds where that prophecy could have been presented to you, you’d be decreasing the average disutility by (($10,000)(.5 probability) - ($100) = $4,900). Not following a precommitment that you would have made to prevent the exact situation which you’re now in because you wouldn’t have followed the precommitment seems an obvious failure mode, but UDT successfully does the same calculation shown above and tells you to fight the prophecy. The simple fact that should tell causal decision theorists that converting to UDT is the causally optimal decision is that Updateless Decision Theorists actually do better on average than CDT proponents.
(You may assume also that your timeline is the only timeline that exists, so as not to further complicate the problem by your degree of empathy with your selves from other existing timelines.)
[LINK] Vladimir Slepnev talks about logical counterfactuals
Vladimir Slepnev (aka cousin_it) gives a popular introduction to logical counterfactuals and modal updateless decision theory in the Tel Aviv LessWrong meetup.
[https://www.youtube.com/watch?v=Ad30JlVh4dM&feature=youtu.be]
Identity and quining in UDT
Outline: I describe a flaw in UDT that has to do with the way the agent defines itself (locates itself in the universe). This flaw manifests in failure to solve a certain class of decision problems. I suggest several related decision theories that solve the problem, some of which avoid quining thus being suitable for agents that cannot access their own source code.
EDIT: The decision problem I call here the "anti-Newcomb problem" already appeared here. Some previous solution proposals are here. A different but related problem appeared here.
Updateless decision theory, the way it is usually defined, postulates that the agent has to use quining in order to formalize its identity, i.e. determine which portions of the universe are considered to be affected by its decisions. This leaves the question of which decision theory should agents that don't have access to their source code use (as humans intuitively appear to be). I am pretty sure this question has already been posed somewhere on LessWrong but I can't find the reference: help? It also turns out that there is a class of decision problems for which this formalization of identity fails to produce the winning answer.
When one is programming an AI, it doesn't seem optimal for the AI to locate itself in the universe based solely on its own source code. After all, you build the AI, you know where it is (e.g. running inside a robot), why should you allow the AI to consider itself to be something else, just because this something else happens to have the same source code (more realistically, happens to have a source code correlated in the sense of logical uncertainty)?
Consider the following decision problem which I call the "UDT anti-Newcomb problem". Omega is putting money into boxes by the usual algorithm, with one exception. It isn't simulating the player at all. Instead, it simulates what would a UDT agent do in the player's place. Thus, a UDT agent would consider the problem to be identical to the usual Newcomb problem and one-box, receiving $1,000,000. On the other hand, a CDT agent (say) would two-box and receive $1,000,1000 (!) Moreover, this problem reveals UDT is not reflectively consistent. A UDT agent facing this problem would choose to self-modify given the choice. This is not an argument in favor of CDT. But it is a sign something is wrong with UDT, the way it's usually done.
The essence of the problem is that a UDT agent is using too little information to define its identity: its source code. Instead, it should use information about its origin. Indeed, if the origin is an AI programmer or a version of the agent before the latest self-modification, it appears rational for the precursor agent to code the origin into the successor agent. In fact, if we consider the anti-Newcomb problem with Omega's simulation using the correct decision theory XDT (whatever it is), we expect an XDT agent to two-box and leave with $1000. This might seem surprising, but consider the problem from the precursor's point of view. The precursor knows Omega is filling the boxes based on XDT, whatever the decision theory of the successor is going to be. If the precursor knows XDT two-boxes, there is no reason to construct a successor that one-boxes. So constructing an XDT successor might be perfectly rational! Moreover, a UDT agent playing the XDT anti-Newcomb problem will also two-box (correctly).
To formalize the idea, consider a program called the precursor which outputs a new program
called the successor. In addition, we have a program
called the universe which outputs a number
called utility.
Usual UDT suggests for the following algorithm:
(1)
Here, is the input space,
is the output space and the expectation value is over logical uncertainty.
appears inside its own definition via quining.
The simplest way to tweak equation (1) in order to take the precursor into account is
(2)
This seems nice since quining is avoided altogether. However, this is unsatisfactory. Consider the anti-Newcomb problem with Omega's simulation involving equation (2). Suppose the successor uses equation (2) as well. On the surface, if Omega's simulation doesn't involve 1, the agent will two-box and get $1000 as it should. However, the computing power allocated for evaluation the logical expectation value in (2) might be sufficient to suspect
's output might be an agent reasoning based on (2). This creates a logical correlation between the successor's choice and the result of Omega's simulation. For certain choices of parameters, this logical correlation leads to one-boxing.
The simplest way to solve the problem is letting the successor imagine that produces a lookup table. Consider the following equation:
(3)
Here, is a program which computes
using a lookup table: all of the values are hardcoded.
For large input spaces, lookup tables are of astronomical size and either maximizing over them or imagining them to run on the agent's hardware doesn't make sense. This is a problem with the original equation (1) as well. One way out is replacing the arbitrary functions with programs computing such functions. Thus, (3) is replaced by
(4)
Where is understood to range over programs receiving input in
and producing output in
. However, (4) looks like it can go into an infinite loop since what if the optimal
is described by equation (4) itself? To avoid this, we can introduce an explicit time limit
on the computation. The successor will then spend some portion
of
performing the following maximization:
(4')
Here, is a program that does nothing for time
and runs
for the remaining time
. Thus, the successor invests
time in maximization and
in evaluating the resulting policy
on the input it received.
In practical terms, (4') seems inefficient since it completely ignores the actual input for a period of the computation. This problem exists in original UDT as well. A naive way to avoid it is giving up on optimizing the entire input-output mapping and focus on the input which was actually received. This allows the following non-quining decision theory:
(5)
Here is the set of programs which begin with a conditional statement that produces output
and terminate execution if received input was
. Of course, ignoring counterfactual inputs means failing a large class of decision problems. A possible win-win solution is reintroducing quining2:
(6)
Here, is an operator which appends a conditional as above to the beginning of a program. Superficially, we still only consider a single input-output pair. However, instances of the successor receiving different inputs now take each other into account (as existing in "counterfactual" universes). It is often claimed that the use of logical uncertainty in UDT allows for agents in different universes to reach a Pareto optimal outcome using acausal trade. If this is the case, then agents which have the same utility function should cooperate acausally with ease. Of course, this argument should also make the use of full input-output mappings redundant in usual UDT.
In case the precursor is an actual AI programmer (rather than another AI), it is unrealistic for her to code a formal model of herself into the AI. In a followup post, I'm planning to explain how to do without it (namely, how to define a generic precursor using a combination of Solomonoff induction and a formal specification of the AI's hardware).
1 If Omega's simulation involves , this becomes the usual Newcomb problem and one-boxing is the correct strategy.
2 Sorry agents which can't access their own source code. You will have to make do with one of (3), (4') or (5).
Blackmail, continued: communal blackmail, uncoordinated responses
The heuristic that one should always resist blackmail seems a good one (no matter how tricky blackmail is to define). And one should be public about this, too; then, one is very unlikely to be blackmailed. Even if one speaks like an emperor.
But there's a subtlety: what if the blackmail is being used against a whole group, not just against one person? The US justice system is often seen to function like this: prosecutors pile on ridiculous numbers charges, threatening uncounted millennia in jail, in order to get the accused to settle for a lesser charge and avoid the expenses of a trial.
But for this to work, they need to occasionally find someone who rejects the offer, put them on trial, and slap them with a ridiculous sentence. Therefore by standing up to them (or proclaiming in advance that you will reject such offers), you are not actually making yourself immune to their threats. Your setting yourself up to be the sacrificial one made an example of.
Of course, if everyone were a UDT agent, the correct decision would be for everyone to reject the threat. That would ensure that the threats are never made in the first place. But - and apologies if this shocks you - not everyone in the world is a perfect UDT agent. So the threats will get made, and those resisting them will get slammed to the maximum.
Of course, if everyone could read everyone's mind and was perfectly rational, then they would realise that making examples of UDT agents wouldn't affect the behaviour of non-UDT agents. In that case, UDT agents should resist the threats, and the perfectly rational prosecutor wouldn't bother threatening UDT agents. However - and sorry to shock your views of reality three times in one post - not everyone is perfectly rational. And not everyone can read everyone's minds.
So even a perfect UDT agent must, it seems, sometimes succumb to blackmail.
Parametric polymorphism in updateless intelligence metrics
Followup to: Agents with Cartesian childhood and Physicalist adulthood
In previous posts I have defined a formalism for quantifying the general intelligence of an abstract agent (program). This formalism relies on counting proofs in a given formal system F (like in regular UDT), which makes it susceptible to the Loebian obstacle. That is, if we imagine the agent itself making decisions by looking for proofs in the same formal system F then it would be impossible to present a general proof of its trustworthiness, since no formal system can assert is own soundness. Thus the agent might fail to qualify for high intelligence ranking according to the formalism. We can assume the agent uses a weaker formal system the soundness of which is provable in F but then we still run into difficulties if we want the agent to be self-modifying (as we expect it to be). Such an agent would have to trust its descendants which means that subsequent agents use weaker and weaker formal systems until self-modification becomes impossible.
One known solution to this is Benja's parametric polymorphism. In this post I adapt parametric polymorphism to the updateless intelligence metric framework. The formal form of this union looks harmonious but it raises questions which I currently don't fully understand.
"Ineffable mystery" using oracles instead of timeouts
In the original parametric polymorphism, a constant κ is introduced (informally known as "the number of ineffable mystery") s.t. the agent has to prove its actions are "safe" for time period κ (i.e. have no "bad" consequences during this period). Since it's impossible to the agent to prove any upper bound on κ, effectively its actions have to be safe indefinitely. I found that for our purposes it's better to cast the formalism otherwise. Instead of κ playing the role of a "timeout", the agent is provided with an oracle that answers questions of the form "κ > n?" for any n. Contrary to what the agent thinks, the oracle provides an affirmative answer for all n.
Let's do some math. The formalism I introduce at this point is very close to Yudkowsky and Herreshoff, with some adjustments.
Consider an infinite chain of agents A0, A1, A2... where A{i+1} is created by Ai. Formally, each Ai is a program for a universal oracle machine which computes a_i = (a_i, A{i+1}). Here a_i is a "motor action". I'm ignoring sensory input for the sake of simplicity (of course sensory input will reappear when we return to intelligence metrics).
Denote T_κ the usual theory of parametric polymorphism. That is T_κ results from starting with a theory T (e.g. Peano Arithmetic), extending the language by the symbol κ (which behaves like a natural number) and adding the (self-referential) axiom schema
(1) For all formulae φ: κ > 0 -> For all n: (@_{T_κ} "φ(n)" -> φ(n)[κ\κ-1])
Here '@_S' is a formula with the semantics of "provable in theory S" and φ(n)[κ\κ-1] is φ(n) with all occurrences of κ replaced by κ-1. Thus (1) is a self-referential soundness schema saved from paradoxicality by the "counter" κ. See Yudkowsky and Herreshoff for the proof that T_κ is sound (it has 1 model for each possible value of κ).
Assume the licensing condition for A0 to perform action b_0 is
(2) Exists n: T_κ | κ > n -> (A0() = b_0 -> G)
Here S | φ means that φ is provable in S, G is a formula denoting the "goal" of the agent (as in Yudkowsky and Herreshoff, the agent is for now assumed to be a satisficer), A0() is a formula meaning "the result of the computation A0", where the oracle is interpreted as answering the question "κ > x?" (in particular the formula A0() involves κ). Such a licensing condition can be implemented in practice by making A0 a proof searcher which is allowed to use the oracle to introduce assumptions of the form κ > n. In particular, our agent employs ambient control.
We then have the following proposition, showing that A0 is able to tile indefinitely (create an infinite chain of meaningful descendants). The proposition is formulated for A1 but the rest of the chain follows by induction since the form of the licensing condition is preserved (up to a certain twist) by the tiling.
Proposition
Assume b_0 = (b_0, B1) and B1 is s.t.
(3) T_κ | For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (κ > n and @_{T_κ} "(κ+1 > n and B1()[κ\κ+1] = b_1) -> G[κ\κ+1]")))
Here 0 is some default "harmless" action, performed if no appropriate theorem is proved.
Further assume that if B1 is created and does 0, the goal is achieved (in this framework "achieving the goal" is to be thought of as "not doing anything unsafe").
(4) T_κ | (A0() = b_0 and B1() = 0) -> G
Then A0 is allowed to create B1 i.e. licensing condition (2) holds.
B1's licensing condition contains a "twist", namely it believes that when the oracle receives input n it answers the question "κ+1 > n?" rather than "κ > n" and its goal is shifted in the same fashion. Note that the goal typically depends on κ since the agent itself and thus its κ-oracle is part of the universe in which this goal is to be attained. For example the goal can be a function of the numbers (a0, a1, a2...) defined as the output of corresponding computations with κ-oracles. Thus the shift in G can also be interpreted as a shifted interpretation of the oracle by B1.
Proof
(3) trivially implies that
T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (κ > n and @_{T_κ} "(κ+1 > n and B1()[κ\κ+1] = b_1) -> G[κ\κ+1]"))))
Using the premise κ > 0, we can apply the soundness schema (1). We obtain:
T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (κ > n and ((κ > n and B1() = b_1) -> G)))))
T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (B1() = b_1 -> G))))
T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or (B1() = b_1 -> G))))
T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or G)))
This trivially implies that
T_κ | κ > 0 -> (For all b_1: ((A0() = b_0 and B1() = b_1) -> (b_1 = 0 or G)))
Applying (4) we finally get (2) for n=0
Back to intelligence metrics
To apply parametric polymorphism to the updateless intelligence formalism, we need to do the following:
- Assume the machine M on which the agent is running is an oracle machine.
- Assume the Solomonoff measure of the ideal (Cartesian) universe X is defined using a universal oracle machine. The oracle in M has to correspond to the oracle in the hypothesis T describing X: this correspondence becomes part of the rules N.
- Assume the universal program u defining the Solomonoff measure for the physical universe is a universal oracle program, i.e. the hypotheses D describing the physical universe are also allowed to invoke the oracle.
- Assume the logical expectation value EL is computed using T_κ extended by N applied to the given T (this is provable in T_κ anyway but we want the proof to be short) and the axiom schema {κ > n} for every natural number n. The latter extension is consistent since adding any finite number of such axioms admits models. The proofs counted in EL interpret the oracle as answering the the question "κ > n?". That is, they are proofs of theorems of the form "if this oracle-program T computes q when the oracle is taken to be κ > n, then the k-th digit of the expected utility is 0/1 where the expected utility is defined by a Solomonoff sum over oracle programs with the oracle again taken to be κ > n".
Discussion
- Such an agent, when considering hypotheses consistent with given observations, will always face a large number of different compatible hypothesis with similar complexity. These hypotheses result from arbitrary insertions of the oracle (which increase complexity of course, but not drastically). It is not entirely clear to me how such an epistemology will look like.
- The formalism admits naturalistic trust to the extent the agent believes that the other agent's oracle is "genuine" and carries a sufficient "twist". This will often be ambiguous so trust will probably be limited to some finite probability. If the other agent is equivalent to the given one on the level of physical implementation then the trust probability is likely to be high.
- The agent is able to quickly confirm κ > n for any n small enough to fit into memory. For the sake of efficiency we might want to enhance this ability by allowing the agent to confirm that (Exist n: φ(n)) -> Exist n: (φ(n) and κ > n) for any given formula φ.
- For the sake of simplicity I neglected multi-phase AI development, but the corresponding construction seems to be straightforward.
- Overall I retain the feeling that a good theory of logical uncertainty should allow the agent to assign a high probability the soundness of its own reasoning system (a la Christiano et al). Whether this will make parametric polymorphism redundant remains to be seen.
Agents with Cartesian childhood and Physicalist adulthood
Followup to: Updateless intelligence metrics in the multiverse
In the previous post I explained how to define a quantity that I called "the intelligence metric" which allows comparing intelligence of programs written for a given hardware. It is a development of the ideas by Legg and Hutter which accounts for the "physicality" of the agent i.e. that the agent should be aware it is part of the physical universe it is trying to model (this desideratum is known as naturalized induction). My construction of the intelligence metric exploits ideas from UDT, translating them from the realm of decision algorithms to the realm of programs which run on an actual piece of hardware with input and output channels, with all the ensuing limitations (in particular computing resource limitations).
In this post I present a variant of the formalism which overcomes a certain problem implicit in the construction. This problem has to do with overly strong sensitivity to the choice of a universal computing model used in constructing Solomonoff measure. The solution sheds some interesting light on how the development of the seed AI should occur.
Structure of this post:
- A 1-paragraph recap of how the updateless intelligence formalism works. The reader interested in technical details is referred to the previous post.
- Explanation of the deficiencies in the formalism I set out to overcome.
- Explanation of the solution.
- Concluding remarks concerning AI safety and future development.
TLDR of the previous formalism
The metric is a utility expectation value over a Solomonoff measure in the space of hypotheses describing a "Platonic ideal" version of the target hardware. In other words it is an expectation value over all universes containing this hardware in which the hardware cannot "break" i.e. violate the hardware's intrinsic rules. For example, if the hardware in question is a Turing machine, the rules are the time evolution rules of the Turing machine, if the hardware in question is a cellular automaton, the rules are the rules of the cellular automaton. This is consistent with the agent being Physicalist since the utility function is evaluated on a different universe (also distributed according to a Solomonoff measure) which isn't constrained to contain the hardware or follow its rules. The coupling between these two different universes is achieved via the usual mechanism of interaction between the decision algorithm and the universe in UDT i.e. by evaluating expectation values conditioned on logical counterfactuals.
Problem
The Solomonoff measure depends on choosing a universal computing model (e.g. a universal Turing machine). Solomonoff induction only depends on this choice weakly in the sense that any Solomonoff predictor converges to the right hypothesis given enough time. This has to do with the fact that Kolmogorov complexity only depends on the choice of universal computing model through an O(1) additive correction. It is thus a natural desideratum for the intelligence metric to depend on the universal computing model weakly in some sense. Intuitively, the agent in question should always converge to the right model of the universe it inhabits regardless of the Solomonoff prior with which it started.
The problem with realizing this expectation has to do with exploration-exploitation tradeoffs. Namely, if the prior strongly expects a given universe, the agent would be optimized for maximal utility generation (exploitation) in this universe. This optimization can be so strong that the agent would lack the faculty to model the universe in any other way. This is markedly different from what happens with AIXI since our agent has limited computing resources to spare and it is physicalist therefore its source code might have side effects important to utility generation that have nothing to do with the computation implemented by the source code. For example, imagine that our Solomonoff prior assigns very high probability to a universe inhabited by Snarks. Snarks have the property that once they see a robot programmed with the machine code "000000..." they immediately produce a huge pile of utilons. On the other hand, when they see a robot programmed with any other code they immediately eat it and produce a huge pile of negative utilons. Such a prior would result in the code "000000..." being assigned the maximal intelligence value even though it is everything but intelligent. Observe that there is nothing preventing us from producing a Solomonoff prior with such bias since it is possible to set the probabilities of any finite collection of computable universes to any non-zero values with sum < 1.
More precisely, the intelligence metric involves two Solomonoff measures: the measure of the "Platonic" universe and the measure of the physical universe. The latter is not really a problem since it can be regarded to be a part of the utility function. The utility-agnostic version of the formalism assumes a program for computing the utility function is read by the agent from a special storage. There is nothing to stop us from postulating that the agent reads another program from that storage which is the universal computer used for defining the Solomonoff measure over the physical universe. However, this doesn't solve our problem since even if the physical universe is distributed with a "reasonable" Solomonoff measure (assuming there is such a thing), the Platonic measure determines in which portions of the physical universe (more precisely multiverse) our agent manifests.
There is another way to think about this problem. If the seed AI knows nothing about the universe except the working of its own hardware and software, the Solomonoff prior might be insufficient "information" to prevent it from making irreversible mistakes early on. What we would like to do is to endow it from the first moment with the sum of our own knowledge, but this might prove to be very difficult.
Solution
Imagine the hardware architecture of our AI to be composed of two machines. One I call the "child machine", the other the "adult machine". The child machine receives data from the same input channels (and "utility storage") as the adult machine and is able to read the internal state of the adult machine itself or at least the content of its output channels. However, the child machine has no output channels of its own. The child machine has special memory called "template memory" into which it has unlimited write access. There a single moment in time ("end of childhood"), determined by factors external to both machines (i.e. the human operator) in which the content of the template memory is copied into the instruction space of the adult machine. Thus, the child machine's entire role is making observations and using them to prepare a program for the adult machine which will be eventually loaded into the latter.
The new intelligence metric assigns intelligence values to programs for the child machine. For each hypothesis describing the Platonic universe (which now contains both machines, the end of childhood time value and the entire ruleset of the system) we compute the utility expectation value under the following logical counterfactual condition: "The program loaded into template memory at the end of childhood is the same as would result from the given program for the child machine if this program for the child machine would be run with the inputs actually produced by the given hypothesis regarding the Platonic universe". The intelligence value is then the expectation value of that quantity with respect to a Solomonoff measure over hypotheses describing the Platonic universe.
The important property of the logical counterfactual is that it doesn't state the given program is actually loaded into the child machine. It only says the resulting content of the template memory is the same as which would be obtained from the given program assuming all the laws of the Platonic universe hold. This formulation prevents exploitation of side effects of the child source code since the condition doesn't fix the source code, only its output. Effectively, the child agents considers itself to be Cartesian, i.e. can consider neither the side effects of its computations nor the possibility the physical universe will violate the laws of its machinery. On the other hand the child's output (the mature program) is a physicalist agent since it affects the physical universe by manifesting in it.
If such an AI is implemented in practice, it makes sense to prime the adult machine with a "demo" program which will utilize the output channels in various ways and do some "exploring" using its input channels. This would serve to provide the child with as much as possible information.
To sum up, the new expression for the intelligence metric is:
I(q) = EHX[EHY(Ec(X))[EL[U(Y, Eu(X)) | Q(X, t(X)) = Q*(X; q)]] | N]
Here:
- q is the program priming the child machine
- HX is the hypothesis producing the Platonic universe X (a sequence of bits encoding the state of the hardware as a function of time and the end-of-childhood time t(X)). It is a program for a fixed universal computing model C.
- HY is the hypothesis producing the Physical universe (an abstract sequence of bits). It is a program for the universal computer program ("virtual machine") Ec(X) written into storage E in X.
- EL is logical expectation value defined e.g. using evidence logic.
- Eu(X) is a program for computing the utility function which is written into storage E in X.
- U is the utility function which consists of applying Eu(X) to Y.
- Q(X, t(X)) is the content of template memory at time t(X).
- Q*(X; q) is the content that would be in the template memory if it was generated by program q receiving the inputs going into the child machine under hypothesis HX.
- N is the full ruleset of the hardware including the reprogramming of the adult machine that occurs at t(X).
Concluding Remarks
- It would be very valuable to formulate and prove a mathematical theorem which expresses the sense in which the new formalism depends on the choice of universal computing model weakly (in particular it would validate the notion).
- This formalism might have an interesting implication on AI safety. Since the child agent is Cartesian and has no output channels (it cannot create output channels because it is Cartesian) it doesn't present as much risk as an adult AI. Imagine template memory is write-only (which is not a problem for the formalism) and is implemented by a channel that doesn't store the result anywhere (in particular the mature program is never run). There can still be risk due to side effects of the mature program that manifest through presence of its partial or full versions in (non-template) memory of the child machine. For example, imagine the mature program is s.t. any person who reads it experiences compulsion to run it. This risk can be mitigated by allowing both machines to interact only with a virtual world which receives no inputs from the external reality. Of course the AI might still be able to deduce external reality. However, this can be prevented by exploiting prior bias: we can equip the AI with a Solomonoff prior that favors the virtual world to such extent that it would have no reason to deduce the real world. This way the AI is safe unless it invents a "generic" box-escaping protocol which would work in a huge variety of different universes that might contain the virtual world.
- If we factor finite logical uncertainty into evaluation of the logical expectation value EL, the plot thickens. Namely, a new problem arises related to bias in the "logic prior". To solve this new problem we need to introduce yet another stage into AI development which might be dubbed "fetus". The fetus has no access to external inputs and is responsible for building a sufficient understanding of mathematics in the same sense the child is responsible to build a sufficient understanding of physics. Details will follow in subsequent posts, so stay tuned!
Updateless Intelligence Metrics in the Multiverse
Followup to: Intelligence Metrics with Naturalized Induction using UDT
In the previous post I have defined an intelligence metric solving the duality (aka naturalized induction) and ontology problems in AIXI. This model used a formalization of UDT using Benja's model of logical uncertainty. In the current post I am going to:
- Explain some problems with my previous model (that section can be skipped if you don't care about the previous model and only want to understand the new one).
- Formulate a new model solving these problems. Incidentally, the new model is much closer to the usual way UDT is represented. It is also based on a different model of logical uncertainty.
- Show how to define intelligence without specifying the utility function a priori.
- Since the new model requires utility functions formulated with abstract ontology i.e. well-defined on the entire Tegmark level IV multiverse. These are generally difficult to construct (i.e. the ontology problem resurfaces in a different form). I outline a method for constructing such utility functions.
Problems with UIM 1.0
The previous model postulated that naturalized induction uses a version of Solomonoff induction updated in the direction of an innate model N with a temporal confidence parameter t. This entails several problems:
- The dependence on the parameter t whose relevant value is not easy to determine.
- Conceptual divergence from the UDT philosophy that we should not update at all.
- Difficulties with counterfactual mugging and acausal trade scenarios in which G doesn't exist in the "other universe".
- Once G discovers even a small violation of N at a very early time, it loses all ground for trusting its own mind. Effectively, G would find itself in the position of a Boltzmann brain. This is especially dangerous when N over-specifies the hardware running G's mind. For example assume N specifies G to be a human brain modeled on the level of quantum field theory (particle physics). If G discovers that in truth it is a computer simulation on the merely molecular level, it loses its epistemic footing completely.
UIM 2.0
I now propose the following intelligence metric (the formula goes first and then I explain the notation):
IU(q) := ET[ED[EL[U(Y(D)) | Q(X(T)) = q]] | N]
- N is the "ideal" model of the mind of the agent G. For example, it can be a universal Turing machine M with special "sensory" registers e whose values can change arbitrarily after each step of M. N is specified as a system of constraints on an infinite sequence of natural numbers X, which should be thought of as the "Platonic ideal" realization of G, i.e. an imagery realization which cannot be tempered with by external forces such as anvils. As we shall see, this "ideal" serves as a template for "physical" realizations of G which are prone to violations of N.
- Q is a function that decodes G's code from X e.g. the program loaded in M at time 0. q is a particular value of this code whose (utility specific) intelligence IU(q) we are evaluating.
- T is a random (as in random variable) computable hypothesis about the "physics" of X, i.e a program computing X implemented on some fixed universal computing model (e.g. universal Turing machine) C. T is distributed according to the Solomonoff measure however the expectation value in the definition of IU(q) is conditional on N, i.e. we restrict to programs which are compatible with N. From the UDT standpoint, T is the decision algorithm itself and the uncertainty in T is "introspective" uncertainty i.e. the uncertainty of the putative precursor agent PG (the agent creating G e.g. an AI programmer) regarding her own decision algorithm. Note that we don't actually need to postulate a PG which is "agenty" (i.e. use for N a model of AI hardware together with a model of the AI programmer programming this hardware), we can be content to remain in a more abstract framework.
- D is a random computable hypothesis about the physics of Y, where Y is an infinite sequence of natural numbers representing the physical (as opposed to "ideal") universe. D is distributed according to the Solomonoff measure and the respective expectation value is unconditional (i.e. we use the raw Solomonoff prior for Y which makes the model truly updateless). In UDT terms, D is indexical uncertainty.
- U is a computable function from infinite sequences of natural numbers to [0, 1] representing G's utility function.
- L represents logical uncertainty. It can be defined by the model explained by cousin_it here, together with my previous construction for computing logical expectation values of random variables in [0, 1]. That is, we define EL(dk) to be the probability that a random string of bits p encodes a proof of the sentence "Q(X(T)) = q implies that the k-th digit of U(Y(D)) is 1" in some prefix-free encoding of proofs conditional on p encoding the proof of either that sentence or the sentence "Q(X(T)) = q implies that the k-th digit of U(Y(D)) is 0". We then define
EL[U(Y(D)) | Q(X(T)) = q] := Σk 2-k EL(dk). Here, the sentences and the proofs belong to some fixed formal logic F, e.g. Peano arthimetics or ZFC.
Discussion
- G's mental architecture N is defined in the "ideal" universe X where it is inviolable. However, G's utility function U inhabits the physical universe Y. This means that a highly intelligent q is designed so that imperfect realizations of G inside Y generate as many utilons as possible. A typical T is a low Kolmogorov complexity universe which contains a perfect realization of G. Q(X(T)) is L-correlated to the programming of imperfect realizations of G inside Y because T serves as an effective (approximate) model of the formation of these realizations. For abstract N, this means q is highly intelligent when a Solomonoff-random "M-programming process" producing q entails a high expected value of U.
- Solving the Loebian obstacle requires a more sophisticated model of logical uncertainty. I think I can formulate such a model. I will explain it in another post after more contemplation.
- It is desirable that the encoding of proofs p satisfies a universality property so that the length of the encoding can only change by an additive constant, analogically to the weak dependence of Kolmogorov complexity on C. It is in fact not difficult to formulate this property and show the existence of appropriate encodings. I will discuss this point in more detail in another post.
Generic Intelligence
It seems conceptually desirable to have a notion of intelligence independent of the specifics of the utility function. Such an intelligence metric is possible to construct in a way analogical to what I've done in UIM 1.0, however it is no longer a special case of the utility-specific metric.
Assume N to consist of a machine M connected to a special storage device E. Assume further that at X-time 0, E contains a valid C-program u realizing a utility function U, but that this is the only constraint on the initial content of E imposed by N. Define
I(q) := ET[ED[EL[u(Y(D); X(T)) | Q(X(T)) = q]] | N]
Here, u(Y(D); X(T)) means that we decode u from X(T) and evaluate it on Y(D). Thus utility depends both on the physical universe Y and the ideal universe X. This means G is not precisely a UDT agent but rather a "proto-agent": only when a realization of G reads u from E it knows which other realizations of G in the multiverse (the Solomonoff ensemble from which Y is selected) should be considered as the "same" agent UDT-wise.
Incidentally, this can be used as a formalism for reasoning about agents that don't know their utility functions. I believe this has important applications in metaethics I will discuss in another post.
Utility Functions in the Multiverse
UIM 2.0 is a formalism that solves the diseases of UIM 1.0 at the price of losing N in the capacity of the ontology for utility functions. We need the utility function to be defined on the entire multiverse i.e. on any sequence of natural numbers. I will outline a way to extend "ontology-specific" utility functions to the multiverse through a simple example.
Suppose G is an agent that cares about universes realizing the Game of Life, its utility function U corresponding to e.g. some sort of glider maximization with exponential temporal discount. Fix a specific way DC to decode any Y into a history of a 2D cellular automaton with two cell states ("dead" and "alive"). Our multiversal utility function U* assigns Ys for which DC(Y) is a legal Game of Life the value U(DC(Y)). All other Ys are treated by dividing the cells into cells O obeying the rules of Life and cells V violating the rules of Life. We can then evaluate U on O only (assuming it has some sort of locality) and assign V utility by some other rule, e.g.:
- zero utility
- constant utility per V cell with temporal discount
- constant utility per unit of surface area of the boundary between O and V with temporal discount
Discussion
- The construction of U* depends on the choice of DC. However, U* only depends on DC weakly since given a hypothesis D which produces a Game of Life wrt some other low complexity encoding, there is a corresponding hypothesis D' producing a Game of Life wrt DC. D' is obtained from D by appending a corresponding "transcoder" and thus it is only less Solomonoff-likely than D by an O(1) factor.
- Since the accumulation between O and V is additive rather than e.g. multiplicative, a U*-agent doesn't behave as if it a priori expects the universe the follow the rules of Life but may have strong preferences about the universe actually doing it.
- This construction is reminiscent of Egan's dust theory in the sense that all possible encodings contribute. However, here they are weighted by the Solomonoff measure.
TLDR
The intelligence of a physicalist agent is defined to be the UDT-value of the "decision" to create the agent by the process creating the agent. The process is selected randomly from a Solomonoff measure conditional on obeying the laws of the hardware on which the agent is implemented. The "decision" is made in an "ideal" universe in which the agent is Cartesian, but the utility function is evaluated on the real universe (raw Solomonoff measure). The interaction between the two "universes" is purely via logical conditional probabilities (acausal).
If we want to discuss intelligence without specifying a utility function up front, we allow the "ideal" agent to read a program describing the utility function from a special storage immediately after "booting up".
Utility functions in the Tegmark level IV multiverse are defined by specifying a "reference universe", specifying an encoding of the reference universe and extending a utility function defined on the reference universe to encodings which violate the reference laws by summing the utility of the portion of the universe which obeys the reference laws with some function of the space-time shape of the violation.
SUDT: A toy decision theory for updateless anthropics
The best approach I know for thinking about anthropic problems is Wei Dai's Updateless Decision Theory (UDT). We aren't yet able to solve all problems that we'd like to—for example, when it comes to game theory, the only games we have any idea how to solve are very symmetric ones—but for many anthropic problems, UDT gives the obviously correct solution. However, UDT is somewhat underspecified, and cousin_it's concrete models of UDT based on formal logic are rather heavyweight if all you want is to figure out the solution to a simple anthropic problem.
In this post, I introduce a toy decision theory, Simple Updateless Decision Theory or SUDT, which is most definitely not a replacement for UDT but makes it easy to formally model and solve the kind of anthropic problems that we usually apply UDT to. (And, of course, it gives the same solutions as UDT.) I'll illustrate this with a few examples.
This post is a bit boring, because all it does is to take a bit of math that we already implicitly use all the time when we apply updateless reasoning to anthropic problems, and spells it out in excruciating detail. If you're already well-versed in that sort of thing, you're not going to learn much from this post. The reason I'm posting it anyway is that there are things I want to say about updateless anthropics, with a bit of simple math here and there, and while the math may be intuitive, the best thing I can point to in terms of details are the posts on UDT, which contain lots of irrelevant complications. So the main purpose of this post is to save people from having to reverse-engineer the simple math of SUDT from the more complex / less well-specified math of UDT.
(I'll also argue that Psy-Kosh's non-anthropic problem is a type of counterfactual mugging, I'll use the concept of l-zombies to explain why UDT's response to this problem is correct, and I'll explain why this argument still works if there aren't any l-zombies.)
*
I'll introduce SUDT by way of a first example: the counterfactual mugging. In my preferred version, Omega appears to you and tells you that it has thrown a very biased coin, which had only a 1/1000 chance of landing heads; however, in this case, the coin has in fact fallen heads, which is why Omega is talking to you. It asks you to choose between two options, (H) and (T). If you choose (H), Omega will create a Friendly AI; if you choose (T), it will destroy the world. However, there is a catch: Before throwing the coin, Omega made a prediction about which of these options you would choose if the coin came up heads (and it was able to make a highly confident prediction). If the coin had come up tails, Omega would have destroyed the world if it's predicted that you'd choose (H), and it would have created a Friendly AI if it's predicted (T). (Incidentally, if it hadn't been able to make a confident prediction, it would just have destroyed the world outright.)
| Coin falls heads (chance = 1/1000) | Coin falls tails (chance = 999/1000) | |
| You choose (H) if coin falls heads | Positive intelligence explosion |
Humanity wiped out |
| You choose (T) if coin falls heads | Humanity wiped out | Positive intelligence explosion |
In this example, we are considering two possible worlds: and
. We write
(no pun intended) for the set of all possible worlds; thus, in this case,
. We also have a probability distribution over
, which we call
. In our example,
and
.
In the counterfactual mugging, there is only one situation you might find yourself in in which you need to make a decision, namely when Omega tells you that the coin has fallen heads. In general, we write for the set of all possible situations in which you might need to make a decision; the
stands for the information available to you, including both sensory input and your memories. In our case, we'll write
, where
is the single situation where you need to make a decision.
For every , we write
for the set of possible actions you can take if you find yourself in situation
. In our case,
. A policy (or "plan") is a function
that associates to every situation
an action
to take in this situation. We write
for the set of all policies. In our case,
, where
and
.
Next, there is a set of outcomes, , which specify all the features of what happens in the world that make a difference to our final goals, and the outcome function
, which for every possible world
and every policy
specifies the outcome
that results from executing
in the world
. In our case,
(standing for FAI and DOOM), and
and
.
Finally, we have a utility function . In our case,
and
. (The exact numbers don't really matter, as long as
, because utility functions don't change their meaning under affine transformations, i.e. when you add a constant to all utilities or multiply all utilities by a positive number.)
Thus, an SUDT decision problem consists of the following ingredients: The sets ,
and
of possible worlds, situations you need to make a decision in, and outcomes; for every
, the set
of possible actions in that situation; the probability distribution
; and the outcome and utility functions
and
. SUDT then says that you should choose a policy
that maximizes the expected utility
, where
is the expectation with respect to
, and
is the true world.
In our case, is just the probability of the good outcome
, according to the (prior) distribution
. For
, that probability is 1/1000; for
, it is 999/1000. Thus, SUDT (like UDT) recommends choosing (T).
If you set up the problem in SUDT like that, it's kind of hidden why you could possibly think that's not the right thing to do, since we aren't distinguishing situations that are "actually experienced" in a particular possible world
; there's nothing in the formalism that reflects the fact that Omega never asks us for our choice if the coin comes up tails. In my post on l-zombies, I've argued that this makes sense because even if there's no version of you that actually consciously experiences being in the heads world, this version still exists as a Turing machine and the choices that it makes influence what happens in the real world. If all mathematically possible experiences exist, so that there aren't any l-zombies, but some experiences are "experienced more" (have more "magical reality fluid") than others, the argument is even clearer—even if there's some anthropic sense in which, upon being told that the coin fell heads, you can conclude that you should assign a high probability of being in the heads world, the same version of you still exists in the tails world, and its choices influence what happens there. And if everything is experienced to the same degree (no magical reality fluid), the argument is clearer still.
*
From Vladimir Nesov's counterfactual mugging, let's move on to what I'd like to call Psy-Kosh's probably counterfactual mugging, better known as Psy-Kosh's non-anthropic problem. This time, you're not alone: Omega gathers you together with 999,999 other advanced rationalists, all well-versed in anthropic reasoning and SUDT. It places each of you in a separate room. Then, as before, it throws a very biased coin, which has only a 1/1000 chance of landing heads. If the coin does land heads, then Omega asks all of you to choose between two options, (H) and (T). If the coin falls tails, on the other hand, Omega chooses one of you at random and asks that person to choose between (H) and (T). If the coin lands heads and you all choose (H), Omega will create a Friendly AI; same if the coin lands tails, and the person who's asked chooses (T); else, Omega will destroy the world.
| Coin falls heads (chance = 1/1000) | Coin falls tails (chance = 999/1000) | |
| Everyone chooses (H) if asked | Positive intelligence explosion |
Humanity wiped out |
| Everyone chooses (T) if asked |
Humanity wiped out | Positive intelligence explosion |
| Different people choose differently |
Humanity wiped out | (Depends on who is asked) |
We'll assume that all of you prefer a positive FOOM over a gloomy DOOM, which means that all of you have the same values as far as the outcomes of this little dilemma are concerned: , as before, and all of you have the same utility function, given by
and
. As long as that's the case, we can apply SUDT to find a sensible policy for everybody to follow (though when there is more than one optimal policy, and the different people involved can't talk to each other, it may not be clear how one of the policies should be chosen).
This time, we have a million different people, who can in principle each make an independent decision about what to answer if Omega asks them the question. Thus, we have . Each of these people can choose between (H) and (T), so
for every person
, and a policy
is a function that returns either (H) or (T) for every
. Obviously, we're particularly interested in the policies
and
satisfying
and
for all
.
The possible worlds are , and their probabilities are
and
. The outcome function is as follows:
,
for
,
if
, and
otherwise.
What does SUDT recommend? As in the counterfactual mugging, is the probability of the good outcome
, under policy
. For
, the good outcome can only happen if the coin falls heads: in other words, with probability
. If
, then the good outcome can not happen if the coin falls heads, because in that case everybody gets asked, and at least one person chooses (T). Thus, in this case, the good outcome will happen only if the coin comes up tails and the randomly chosen person answers (T); this probability is
, where
is the number of people answering (T). Clearly, this is maximized for
, where
; moreover, in this case we get the probability
, which is better than for
, so SUDT recommends the plan
.
Again, when you set up the problem in SUDT, it's not even obvious why anyone might think this wasn't the correct answer. The reason is that if Omega asks you, and you update on the fact that you've been asked, then after updating, you are quite certain that the coin has landed heads: yes, your prior probability was only 1/1000, but if the coin has landed tails, the chances that you would be asked was only one in a million, so the posterior odds are about 1000:1 in favor of heads. So, you might reason, it would be best if everybody chose (H); and moreover, all the people in the other rooms will reason the same way as you, so if you choose (H), they will as well, and this maximizes the probability that humanity survives. This relies on the fact that the others will choose the same way as you, but since you're all good rationalists using the same decision theory, that's going to be the case.
But in the worlds where the coin comes up tails, and Omega chooses someone else than you, the version of you that gets asked for its decision still "exists"... as an l-zombie. You might think that what this version of you does or doesn't do doesn't influence what happens in the real world; but if we accept the argument from the previous paragraph that your decisions are "linked" to those of the other people in the experiment, then they're still linked if the version of you making the decision is an l-zombie: If we see you as a Turing machine making a decision, that Turing machine should reason, "If the coin came up tails and someone else was chosen, then I'm an l-zombie, but the person who is actually chosen will reason exactly the same way I'm doing now, and will come to the same decision; hence, my decision influences what happens in the real world even in this case, and I can't do an update and just ignore those possible worlds."
I call this the "probably counterfactual mugging" because in the counterfactual mugging, you are making your choice because of its benefits in a possible world that is ruled out by your observations, while in the probably counterfactual mugging, you're making it because of its benefits in a set of possible worlds that is made very improbable by your observations (because most of the worlds in this set are ruled out). As with the counterfactual mugging, this argument is just all the stronger if there are no l-zombies because all mathematically possible experiences are in fact experienced.
*
As a final example, let's look at what I'd like to call Eliezer's anthropic mugging: the anthropic problem that inspired Psy-Kosh's non-anthropic one. This time, you're alone again, except that there's many of you: Omega is creating a million copies of you. It flips its usual very biased coin, and if that coin falls heads, it places all of you in exactly identical green rooms. If the coin falls tails, it places one of you in a green room, and all the others in red rooms. It then asks all copies in green rooms to choose between (H) and (T); if your choice agrees with the coin, FOOM, else DOOM.
| Coin falls heads (chance = 1/1000) | Coin falls tails (chance = 999/1000) | |
| Green roomers choose (H) | Positive intelligence explosion |
Humanity wiped out |
| Green roomers choose (T) | Humanity wiped out | Positive intelligence explosion |
Our possible worlds are back to being , with probabilities
and
. We are also back to being able to make a choice in only one particular situation, namely when you're a copy in a green room:
. Actions are
, outcomes
, utilities
and
, and the outcome function is given by
and
. In other words, from SUDT's perspective, this is exactly identical to the situation with the counterfactual mugging, and thus the solution is the same: Once more, SUDT recommends choosing (T).
On the other hand, the reason why someone might think that (H) could be the right answer is closer to that for Psy-Kosh's probably counterfactual mugging: After waking up in a green room, what should be your posterior probability that the coin has fallen heads? Updateful anthropic reasoning says that you should be quite sure that it has fallen heads. If you plug those probabilities into an expected utility calculation, it comes out as in Psy-Kosh's case, heavily favoring (H).
But even if these are good probabilities to assign epistemically (to satisfy your curiosity about what the world probably looks like), in light of the arguments from the counterfactual and the probably counterfactual muggings (where updating definitely is the right thing to do epistemically, but plugging these probabilities into the expected utility calculation gives the wrong result), it doesn't seem strange to me to come to the conclusion that choosing (T) is correct in Eliezer's anthropic mugging as well.
A model of UDT with a concrete prior over logical statements
I've been having difficulties with constructing a toy scenario for AI self-modification more interesting than Quirrell's game, because you really want to do expected utility maximization of some sort, but currently our best-specified decision theories search through the theorems of one particular proof system and "break down and cry" if they can't find one that tells them what their utility will be if they choose a particular option. This is fine if the problems are simple enough that we always find the theorems we need, but the AI rewrite problem is precisely about skirting that edge. It seems natural to want to choose some probability distribution over the possibilities that you can't rule out, and then do expected utility maximization (because if you don't maximize EU over some prior, it seems likely that someone could Dutch-book you); indeed, Wei Dai's original UDT has a "mathematical intuition module" black box which this would be an implementation of. But how do you assign probabilities to logical statements? What consistency conditions do you ask for? What are the "impossible possible worlds" that make up your probability space?
Recently, Wei Dai suggested that logical uncertainty might help avoid the Löbian problems with AI self-modification, and although I'm sceptical about this idea, the discussion pushed me into trying to confront the logical uncertainty problem head-on; then, reading Haim Gaifman's paper "Reasoning with limited resources and assigning probabilities to logical statements" (which Luke linked from So you want to save the world) made something click. I want to present a simple suggestion for a concrete definition of "impossible possible world", for a prior over them, and for an UDT algorithm based on that. I'm not sure whether the concrete prior is useful—the main point in giving it is to have a concrete example we can try to prove things about—but the definition of logical possible worlds looks like a promising theoretical tool to me.
Thoughts on a possible solution to Pascal's Mugging
For those who aren't familiar, Pascal's Mugging is a simple thought experiment that seems to demonstrate an intuitive flaw in naive expected utility maximization. In the classic version, someone walks up to you on the street, and says, 'Hi, I'm an entity outside your current model of the universe with essentially unlimited capabilities. If you don't give me five dollars, I'm going to use my powers to create 3^^^^3 people, and then torture them to death.' (For those not familiar with Knuth up-arrow notation, see here). The idea being that however small your probability is that the person is telling the truth, they can simply state a number that's grossly larger - and when you shut up and multiply, expected utility calculations say you should give them the five dollars, along with pretty much anything else they ask for.
Intuitively, this is nonsense. However, an AI under construction doesn't have a piece of code that lights up when exposed to nonsense. Not unless we program one in. And formalizing why, exactly, we shouldn't listen to the mugger is not as trivial as it sounds. The actual underlying problem has to do with how we handle arbitrarily small probabilities. There are a number of variations you could construct on the original problem that present the same paradoxical results. There are also a number of simple hacks you could undertake that produce the correct results in this particular case, but these are worrying (not to mention unsatisfying) for a number of reasons.
So, with the background out of the way, let's move on to a potential approach to solving the problem which occurred to me about fifteen minutes ago while I was lying in bed with a bad case of insomnia at about five in the morning. If it winds up being incoherent, I blame sleep deprivation. If not, I take full credit.
Let's take a look at a new thought experiment. Let's say someone comes up to you and tells you that they have magic powers, and will make a magic pony fall out of the sky. Let's say that, through some bizarrely specific priors, you decide that the probability that they're telling the truth (and, therefore, the probability that a magic pony is about to fall from the sky) is exactly 1/2^100. That's all well and good.
Now, let's say that later that day, someone comes up to you, and hands you a fair quarter and says that if you flip it one hundred times, the probability that you'll get a straight run of heads is 1/2^100. You agree with them, chat about math for a bit, and then leave with their quarter.
I propose that the probability value in the second case, while superficially identical to the probability value in the first case, represents a fundamentally different kind of claim about reality than the first case. In the first case, you believe, overwhelmingly, that a magic pony will not fall from the sky. You believe, overwhelmingly, that the probability (in underlying reality, divorced from the map and its limitations) is zero. It is only grudgingly that you inch even a tiny morsel of probability into the other hypothesis (that the universe is structured in such a way as to make the probability non-zero).
In the second case, you also believe, overwhelmingly, that you will not see the event in question (a run of heads). However, you don't believe that the probability is zero. You believe it's 1/2^100. You believe that, through only the lawful operation of the universe that actually exists, you could be surprised, even if it's not likely. You believe that if you ran the experiment in question enough times, you would probably, eventually, see a run of one hundred heads. This is not true for the first case. No matter how many times somebody pulls the pony trick, a rational agent is never going to get their hopes up.
I would like, at this point, to talk about the notion of metaconfidence. When we talk to the crazy pony man, and to the woman with the coin, what we leave with are two identical numerical probabilities. However, those numbers do not represent the sum total of the information at our disposal. In the two cases, we have differing levels of confidence in our levels of confidence. And, furthermore, this difference has an actual ramifications on what a rational agent should expect to observe. In other words, even from a very conservative perspective, metaconfidence intervals pay rent. By treating the two probabilities as identical, we are needlessly throwing away information. I'm honestly not sure if this topic has been discussed before. I am not up to date on the literature on the subject. If the subject has already been thoroughly discussed, I apologize for the waste of time.
Disclaimer aside, I'd like to propose that we push this a step further, and say that metaconfidence should play a role in how we calculate expected utility. If we have a very small probability of a large payoff (positive or negative), we should behave differently when metaconfidence is high than when it is low.
From a very superificial analysis, lying in bed, metaconfidence appears to be directional. A low metaconfidence, in the case of the pony claim, should not increase the probability that the probability of a pony dropping out of the sky is HIGHER than our initial estimate. It also works the other way as well: if we have a very high degree of confidence in some event (the sun rising tomorrow), and we get some very suspect evidence to the contrary (an ancient civilization predicting the end of the world tonight), and we update our probability downward slightly, our low metaconfidence should not make us believe that the sun is less likely to rise tomorrow than we thought. Low metaconfidence should move our effective probability estimate against the direction of the evidence that we have low confidence in: the pony is less likely, and the sunrise is more likely, than a naive probability estimate would suggest.
So, if you have a claim like the pony claim (or Pascal's mugging), in which you have a very low estimated probability, and a very low metaconfidence, should become dramatically less likely to actually happen, in the real world, than a case in which we have a low estimated probability, but a very high confidence in that probability. See the pony versus the coins. Rationally, we can only mathematically justify so low a confidence in the crazy pony man's claims. However, in the territory, you can add enough coins that the two probabilities are mathematically equal, and you are still more likely to get a run of heads than you are to have a pony magically drop out of the sky. I am proposing metaconfidence weighting as a way to get around this issue, and allow our map to more accurately reflect the underlying territory. It's not perfect, since metaconfidence is still, ultimately, calculated from our map of the territory, but it seems to me, based on my extremely brief analysis, that it is at least an improvement on the current model.
Essentially, this idea is based on the understanding that the numbers that we generate and call probability do not, in fact, correspond to the actual rules of the territory. They are approximations, and they are perturbed by observation, and our finite data set limits the resolution of the probability intervals we can draw. This causes systematic distortions at the extreme ends of the probability spectrum, and especially at the small end, where the scale of the distortion rises dramatically as a function of the actual probability. I believe that the apparently absurd behavior demonstrated by an expected-utility agent exposed to Pascal's mugging, is a result of these distortions. I am proposing we attempt to compensate by filling in the missing information at the extreme ends of the bell curve with data from our model about our sources of evidence, and about the underlying nature of the territory. In other words, this is simply a way to use our available evidence more efficiently, and I suspect that, in practice, it eliminates many of the Pascal's-mugging-style problems we encounter currently.
I apologize for not having worked the math out completely. I would like to reiterate that it is six thirty in the morning, and I've only been thinking about the subject for about a hundred minutes. That said, I'm not likely to get any sleep either way, so I thought I'd jot the idea down and see what you folks thought. Having outside eyes is very helpful, when you've just had a Brilliant New Idea.
The Creating Bob the Jerk problem. Is it a real problem in decision theory?
I was recently reading about the Transparent Newcomb with your Existence at Stake problem, which, to make a long story short, states that you were created by Prometheus, who foresaw that you would one-box on Newcomb's problem and wouldn't have created you if he had foreseen otherwise. The implication is that you might need to one-box just to exist. It's a disturbing problem, and as I read it another even more disturbing problem started to form in my head. However, I'm not sure it's logically coherent (I'm really hoping it's not) and wanted to know what the rest of you thought. The problem goes:
One day you start thinking about a hypothetical nonexistant person named Bob who is a real jerk. If he existed he would make your life utterly miserable. However, if he existed he would want to make a deal with you. If he ever found himself existing in a universe where you have never existed he would create you, on the condition that if you found yourself existing in a universe where he had never existed you would create him. Hypothetical Bob is very good at predicting the behavior of other people, not quite Omega quality, but pretty darn good. Assume for the sake of the argument that you like your life and enjoy existing.
At first you dismiss the problem because of technical difficulties. Science hasn't advanced to the point where we can make people with such precision. Plus, there is a near-infinite number of far nicer hypothetical people who would make the same deal, when science reaches that point you should give creating them priority.
But then you see Omega drive by in its pickup truck. A large complicated machine falls off the back of the truck as it passes you by. Written on it, in Omega's handwriting, is a note that says "This is the machine that will create Bob the Jerk, a hypothetical person that [insert your name here] has been thinking about recently, if one presses the big red button on the side." You know Omega never lies, not even in notes to itself.
Do Timeless Decision Theory and Updateless Decision Theory say you have a counterfactual obligation to create Bob the Jerk, the same way you have an obligation to pay Omega in the Counterfactual Mugging, and the same way you might (I'm still not sure about this) have an obligation to one-box when dealing with Prometheus? Does this in turn mean that when we develop the ability to create people from scratch we should tile the universe with people who would make the counterfactual deal? Obviously it's that last implication that disturbs me.
I can think of multiple reasons why it might not be rational to create Bob the Jerk:
- It might not be logically coherent to not update to acknowledge the fact of your own existence, even in UDT (this also implies one should two-box when dealing with Prometheus).
- An essential part of who you are is the fact that you were created by your parents, not by Bob the Jerk, so the counterfactual deal isn't logically coherent. Someone he creates wouldn't be you, it would be someone else. At his very best he could create someone with a very similar personality who has falsified memories, which would be rather horrifying.
- An essential part of who Bob the Jerk is is that he was created by you, with some help from Omega. He can't exist in a universe where you don't, so the hypothetical bargain he offered you isn't logically coherent.
- Prometheus will exist no matter what you do in his problem, Bob the Jerk won't. This makes these two problems qualitatively different in some way I don't quite understand.
- You have a moral duty to not inflict Bob the Jerk on others, even if it means you don't exist in some other possibility.
- You have a moral duty to not overpopulate the world, even if it means you might not exist in some other possibility, and the end result of the logic of this problem implies overpopulating the world.
- Bob the Jerk already exists because we live in a Big World, so you have no need to fulfill your part of the bargain because he's already out there somewhere.
- Making these sorts of counterfactual deals is individually rational, but collectively harmful in the same way that paying a ransom is. If you create Bob the Jerk some civic-minded vigilante decision theorist might see the implications and find some way to punish you.
- While it is possible to want to keep on existing if you already exist, it isn't logically possible to "want to exist" if you don't already, this defeats the problem in some way.
- After some thought you spend some time thinking about a hypothetical individual called Bizarro-Bob. Bizarro-Bob doesn't want Bob the Jerk to be created and is just as good at modeling your behavior as Bob the Jerk is. He has vowed that if he ends up existing in a universe where you'll end up creating Bob the Jerk he'll kill you. As you stand by Omega's machine you start looking around anxiously for the glint of light off a gun barrel.
- I don't understand UDT or TDT properly, they don't imply I should create Bob the Jerk for some other reason I haven't thought of because of my lack of understanding.
Are any of these objections valid, or am I just grasping at straws? I find the problem extremely disturbing because of its wider implications, so I'd appreciate it if someone with a better grasp of UDT and TDT analyzed it. I'd very much like to be refuted.
List of Problems That Motivated UDT
I noticed that recently I wrote several comments of the form "UDT can be seen as a step towards solving X" and thought it might be a good idea to list in one place all of the problems that helped motivate UDT1 (not including problems that came up subsequent to that post).
- decision making for minds that can copy themselves
- Doomsday Argument
- Sleeping Beauty
- Absent-Minded Driver
- Presumptuous Philosopher
- anthropic reasoning for non-sentient AIs
- Simulation Argument
- indexical uncertainty in general
- wireheading/Cartesianism (how to formulate something like AIXI that cares about an external world instead of just its sensory inputs)
- How to make decisions if all possible worlds exist? (a la Tegmark or Schmidhuber, or just in the MWI)
- Quantum Immortality/Suicide
- Logical Uncertainty (how to formulate something like Godel machine that can make reasonable decisions involving P=NP)
- uncertainty about hypercomputation (how to avoid assuming we must be living in a computable universe)
- What are probabilities?
- What are decisions and what kind of consequences should be considered when making decisions?
- Newcomb's Problem
- Smoking Lesion
- Prisoner's Dilemma
- Counterfactual Mugging
- FAI
Consequentialist Formal Systems
This post describes a different (less agent-centric) way of looking at UDT-like decision theories that resolves some aspects of the long-standing technical problem of spurious moral arguments. It's only a half-baked idea, so there are currently a lot of loose ends.
On spurious arguments
UDT agents are usually considered as having a disinterested inference system (a "mathematical intuition module" in UDT and first order proof search in ADT) that plays a purely epistemic role, and preference-dependent decision rules that look for statements that characterize possible actions in terms of the utility value that the agent optimizes.
The statements (supplied by the inference system) used by agent's decision rules (to pick one of the many variants) have the form [(A=A1 => U=U1) and U<=U1]. Here, A is a symbol defined to be the actual action chosen by the agent, U is a similar symbol defined to be the actual value of world's utility, and A1 and U1 are some particular possible action and possible utility value. If the agent finds that this statement is provable, it performs action A1, thereby making A1 the actual action.
The use of this statement introduces the problem of spurious arguments: if A1 is a bad action, but for some reason it's still chosen, then [(A=A1 => U=U1) and U<=U1] is true, since utility value U will in that case be in fact U1, which justifies (by the decision rule) choosing the bad action A1. In usual cases, this problem results in the difficulty of proving that an agent will behave in the expected manner (i.e. won't choose a bad action), which is resolved by adding various compilicated clauses to its decision algorithm. But even worse, it turns out that if an agent is hapless enough to take seriously a (formally correct) proof of such a statement supplied by an enemy (or if its own inference system is malicious), it can be persuaded to take any action at all, irrespective of agent's own preferences.
An example of self-fulfilling spurious proofs in UDT
Benja Fallenstein was the first to point out that spurious proofs pose a problem for UDT. Vladimir Nesov and orthonormal asked for a formalization of that intuition. In this post I will give an example of a UDT-ish agent that fails due to having a malicious proof searcher, which feeds the agent a spurious but valid proof.
The basic idea is to have an agent A that receives a proof P as input, and checks P for validity. If P is a valid proof that a certain action a is best in the current situation, then A outputs a, otherwise A tries to solve the current situation by its own means. Here's a first naive formalization, where U is the world program that returns a utility value, A is the agent program that returns an action, and P is the proof given to A:
def U():
if A(P)==1:
return 5
else:
return 10
def A(P):
if P is a valid proof that A(P)==a implies U()==u, and A(P)!=a implies U()<=u:
return a
else:
do whatever
This formalization cannot work because a proof P can never be long enough to contain statements about A(P) inside itself. To fix that problem, let's introduce a function Q that generates the proof P:
def U():
if A(Q())==1:
return 5
else:
return 10
def A(P):
if P is a valid proof that A(Q())==a implies U()==u, and A(Q())!=a implies U()<=u:
return a
else:
do whatever
In this case it's possible to write a function Q that returns a proof that makes A return the suboptimal action 1, which leads to utility 5 instead of 10. Here's how:
Let X be the statement "A(Q())==1 implies U()==5, and A(Q())!=1 implies U()<=5". Let Q be the program that enumerates all possible proofs trying to find a proof of X, and returns that proof if found. (The definitions of X and Q are mutually quined.) If X is provable at all, then Q will find that proof, and X will become true (by inspection of U and A). That reasoning is formalizable in our proof system, so the statement "if X is provable, then X" is provable. Therefore, by Löb's theorem, X is provable. So Q will find a proof of X, and A will return 1.
One possible conclusion is that a UDT agent cannot use just any proof searcher or "mathematical intuition module" that's guaranteed to return valid mathematical arguments, because valid mathematical arguments can make the agent choose arbitrary actions. The proof searchers from some previous posts were well-behaved by construction, but not all of them are.
The troubling thing is that you may end up with a badly behaved proof searcher by accident. For example, consider a variation of U that adds some long and complicated computation to the "else" branch of U, before returning 10. That increases the length of the "natural" proof that a=2 is optimal, but the spurious proof for a=1 stays about the same length as it was, because the spurious proof can just ignore the "else" branch of U. This way the spurious proof can become much shorter than the natural proof. So if (for example) your math intuition module made the innocuous design decision of first looking at actions that are likely to have shorter proofs, you may end up with a spurious proof. And as a further plot twist, if we make U return 0 rather than 10 in the long-to-compute branch, you might choose the correct action due to a spurious proof instead of the natural one.
The limited predictor problem
This post requires some knowledge of logic, computability theory, and K-complexity. Much of the credit goes to Wei Dai. The four sections of the post can be read almost independently.
The limited predictor problem (LPP) is a version of Newcomb's Problem where the predictor has limited computing resources. To predict the agent's action, the predictor simulates the agent for N steps. If the agent doesn't finish in N steps, the predictor assumes that the agent will two-box. LPP is similar to the ASP problem, but with simulation instead of theorem proving.
1. Solving the problem when the agent has a halting oracle
Consider the agent defined in "A model of UDT with a halting oracle", and a predictor that can run the agent's code step-by-step, with oracle calls and all. Turns out that this agent solves LPP correctly if N is high enough. To understand why, note that the agent offloads all interesting work to oracles that return instantly, so the agent's own runtime is provably bounded. If that bound is below N, the agent's oracle will prove that the predictor predicts the agent correctly, so the agent will one-box.
2. Failing to solve the problem when N is algorithmically random
Consider a setting without oracles, with only Turing-computable programs. Maybe the agent should successively search for proofs somehow?
Unfortunately you can't solve most LPPs this way, for a simple but surprising reason. Assume that the predictor's time limit N is a large and algorithmically random number. Then the predictor's source code is >log(N) bits long, because N must be defined in the source code. Then any proof about the world program must also have length >log(N), because the proof needs to at least quote the world program itself. Finding a proof by exhaustive search takes exponential time, so the agent will need >N steps. But the predictor simulates the agent for only N steps. Whoops!
3. Solving the problem when N is large but has a short definition
As usual, let U be the world program that returns a utility value, and A be the agent program that returns an action and has access to the world's source code. Consider the following algorithm for A:
- From L=1 to infinity, search for proofs up to length L of the form "if A()=a and runtime(A)<g(L), then U()=u", where g(L) is an upper bound on runtime(A) if A stops the search at length L. Upon finding at least one proof for each possible a, go to step 2.
- Search for proofs up to length f(L) of the form "if runtime(A)<g(L), then A()≠a", where f(L) is some suitably fast-growing function like 10^L. If such a proof is found, return a.
- If we're still here, return the best a found on step 1.
This algorithm is very similar to the one described in "A model of UDT without proof limits", but with the added complication that A is aware of its own runtime via the function g(L). By an analogous argument, A will find the "intended" proof that the predictor predicts A correctly if runtime(A) is small enough, as long as the "intended" proof exists and isn't too long relative to the predictor's time limit N. More concretely, A will solve all instances of LPP in which N is larger than g(L), where L is the length of the "intended" proof. For example, if f(L)=10^L, then g(L) is doubly exponential, so A will successfully solve LPPs where the predictor's source code defines N using triple exponentials or some more compact notation.
4. A broader view
TDT and UDT were originally designed for solving "decision-determined" problems. The agent figures out how the resulting utility logically depends on the agent's action, then returns the action with the highest utility, thus making the premise true.
But a cleverly coded decision program can also control other facts about itself. For example, the program may figure out how the resulting utility depends on the program's return value and running time, then choose the best return value and choose how long to keep running, thus making both premises true. This idea is a natural extension of quining (you carefully write a program that can correctly judge its own runtime so far) and can be generalized to memory consumption and other properties of programs.
With enough cleverness we could write a program that would sometimes decide to waste time, or run for an even number of clock cycles, etc. We did not need so much cleverness in this post because LPP lies in a smaller class that we may call "LPP-like problems", where utility depends only on the agent's return value and runtime, and the dependence on runtime is monotonous - it never hurts to return the same value earlier. That class also includes all the usual decision-determined problems like Newcomb's Problem, and our A also fares well on those.
I was surprised to find so many new ideas by digging into such a trivial-looking problem as LPP. This makes me suspect that advanced problems like ASP may conceal even more riches, if only we have enough patience to approach them properly...
A model of UDT without proof limits
This post requires some knowledge of decision theory math. Part of the credit goes to Vladimir Nesov.
Let the universe be a computer program U that returns a utility value, and the agent is a subprogram A within U that knows the source code of both A and U. (The same setting was used in the reduction of "could" post.) Here's a very simple decision problem:
def U():
if A() == 1:
return 5
else:
return 10
The algorithm for A will be as follows:
- Search for proofs of statements of the form "A()=a implies U()=u". Upon finding at least one proof for each possible a, go to step 2.
- Let L be the maximum length of proofs found on step 1, and let f(L) be some suitably fast-growing function like 10^L. Search for proofs shorter than f(L) of the form "A()≠a". If such a proof is found, return a.
- If we're still here, return the best a found on step 1.
The usual problem with such proof-searching agents is that they might stumble upon "spurious" proofs, e.g. a proof that A()==2 implies U()==0. If A finds such a proof and returns 1 as a result, the statement A()==2 becomes false, and thus provably false under any formal system; and a false statement implies anything, making the original "spurious" proof correct. The reason for constructing A this particular way is to have a shot at proving that A won't stumble on a "spurious" proof before finding the "intended" ones. The proof goes as follows:
Assume that A finds a "spurious" proof on step 1, e.g. that A()=2 implies U()=0. We have a lower bound on L, the length of that proof: it's likely larger than the length of U's source code, because a proof needs to at least state what's being proved. Then in this simple case 10^L steps is clearly enough to also find the "intended" proof that A()=2 implies U()=10, which combined with the previous proof leads to a similarly short proof that A()≠2, so the agent returns 2. But that can't happen if A's proof system is sound, therefore A will find only "intended" proofs rather than "spurious" ones in the first place.
Quote from Nesov that explains what's going on:
With this algorithm, you're not just passively gauging the proof length, instead you take the first moral argument you come across, and then actively defend it against any close competition
By analogy we can see that A coded with f(L)=10^L will correctly solve all our simple problems like Newcomb's Problem, the symmetric Prisoner's Dilemma, etc. The proof of correctness will rely on the syntactic form of each problem, so the proof may break when you replace U with a logically equivalent program. But that's okay, because "logically equivalent" for programs simply means "returns the same value", and we don't want all world programs that return the same value to be decision-theoretically equivalent.
A will fail on problems where "spurious" proofs are exponentially shorter than "intended" proofs (or even shorter, if f(L) is chosen to grow faster). We can probably construct malicious examples of decision-determined problems that would make A fail, but I haven't found any yet.
Predictability of Decisions and the Diagonal Method
This post collects a few situations where agents might want to make their decisions either predictable or unpredictable to certain methods of prediction, and considers a method of making a decision unpredictable by "diagonalizing" a hypothetical prediction of that decision. The last section takes a stab at applying this tool to the ASP problem.
The diagonal step
To start off, consider the halting problem, interpreted in terms of agents and predictors. Suppose that there is a Universal Predictor, an algorithm that is able to decide whether any given program halts or runs forever. Then, it's easy for a program (agent) to evade its gaze by including a diagonal step in its decision procedure: the agent checks (by simulation) if Universal Predictor comes to some decision about the agent, and if it does, the agent acts contrary to the Predictor's decision. This makes the prediction wrong, and Universal Predictors impossible.
The same trick could be performed against something that could exist, normal non-universal Predictors, which allows an agent to make itself immune to their predictions. In particular, ability of other agents to infer decisions of our agent may be thought of as prediction that an agent might want to hinder. This is possible so long as the predictors in question can be simulated in enough detail, that is it's known what they do (what they know) and our agent has enough computational resources to anticipate their hypothetical conclusions. (If an agent does perform the diagonal step with respect to other agents, the predictions of other agents don't necessarily become wrong, as they could be formally correct by construction, but they cease to be possible, which could mean that the predictions won't be made at all.)
preferences:decision theory :: data:code
I'd like to present a couple thoughts. While I am somewhat confident in my reasonning, my conclusions strongly contradict what I perceive (possibly incorrectly) to be the concensus around decision theory on LessWrong. This consensus has been formed by people who have spent more time than me thinking about it, and are more intelligent than I am. I am aware of that, this is strong evidence that I am mistaken or obvious. I believe nonetheless the argument I'm about to make is valuable and should be heard.
It is argued that the key difference between Newcomb's problem and Solomon's problem is that precommitment is useful in the former and useless in the latter. I agree that the problems are indeed different, but I do not think that is the fundamental reason. The devil is in the details.
Solomon's problem states that
- There is a gene that causes people to chew gum and to develop throat cancer
- Chewing gum benefits everyone
It is generally claimed that EDT would decide not to chew gum, because doing so would place the agent in a state where its expected utility is reduced. This seems incorrect to me. The ambiguity is in what is meant by "causes people to chew gum". If the gene really causes people to chew gum, then that gene by definition affects that agent's decision theory, and the hypothesis that it is also following EDT is contradictory. What is generally meant is that having this gene induces a preference to chew gum, which is generally acted upon by whatever decision algorithm is used. An EDT agent must be fully aware of its own preferences, otherwise it could not calculate its own utility, therefore, the expected utility of chewing gum must be calculated conditional on having a preexisting or non preexisting taste for gum. In a nutshell, an EDT agent updates not on his action to chew gum, but on his desire to do so.
I've established here a distinction between preferences and decision theory. In fact, the two are interchangeable. It is always possible to hard code preferences in the decision theory, and vice versa. The distinction is very similar to the one drawn between code and data. It is an arbitrary but useful distinction. Intuitively, I believe hard coding preferences in the decision algorithm is poor design, though I do not have a clear argument why that is.
If we insist on preferences being part of the decision algorithm, the best decision algorithm for solomon's problem is the one that doesn't have a cancer causing gene. If the algorithm is EDT, then liking gum is a preference, and EDT makes the same decision as CDT.
Let's now look at Newcomb's problem. Omega's decision is clearly not based on a subjective preference for one box or two box (let's say an aesthetic preference for example). Omega's decision is based on our decision algorithm itself. This is the key difference between the two problems, and this is why precommitment works for Newcomb's and not Solomon's.
Solomon's problem is equivalent to this problem, which is not Newcomb's
- If Omega thinks you were born loving Beige, he puts $1,000 in box Beige and nothing in box Aquamarine.
- Otherwise, he puts $1,000 in box Beige and nothing in box Aquamarine.
In this problem, both CDT and EDT (correctly) two box. Again, this is because EDT knows that it loves beige.
Now the real Newcomb's problem. I argue that an EDT agent should integrate his own decision as evidence.
- If EDT's decision is to two-box, then Omega's prediction is that EDT two boxes and EDT should indeed two-box.
- If EDT's decision is to one-box, then Omega's prediction is that EDT one box, and EDT should two-box.
Since EDT reflects on his own decision, it can only be the only fixed point which is to two box.
Both CDT and EDT decide to chew gum and to two box.
If we're out shopping for decision algorithms (TDT, UDT...), we might as well shop for a set of preferences, since they can be interchangeable. It is clear that some preferences allow winning, when variable sum games are involved. This has been implemented by evolution as moral preferences, not as decision algorithms. One useful preference is the preference to keep one's word. Such a preference allows to pay Parfit's hitchiker without involving any preference reversal. Once you're safe, you do not try not to pay, because you genuinely prefer not breaking your promise than keeping the money. Yes, you could have preferences to two box, but there is no reason why you should catter in advance to crazy cosmic entities rewarding certain algorithms or preferences. Omega is no more likely than the TDT and UDT minimizer, evil entities known for torturing TDT and UDT practionners.
Edit: meant to write EDT two-boxes, which is the only fixed point.
Punishing future crimes
Here's an edited version of a puzzle from the book "Chuck Klosterman four" by Chuck Klosterman.
It is 1933. Somehow you find yourself in a position where you can effortlessly steal Adolf Hitler's wallet. The theft will not effect his rise to power, the nature of WW2, or the Holocaust. There is no important identification in the wallet, but the act will cost Hitler forty dollars and completely ruin his evening. You don't need the money. The odds that you will be caught committing the crime are negligible. Do you do it?
When should you punish someone for a crime they will commit in the future? Discuss.
View more: Next
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)