Last August or so, Eliezer asked Steve Rayhawk and myself to attempt to solve Newcomb’s problem together. This project served a couple of purposes:
a. Get an indication as to our FAI research abilities.
b. Train our reduction-muscles.
c. Check whether Eliezer’s (unseen by us) timeless decision theory is a point that outside folks tend to arrive at independently (at least if starting from the rather substantial clues on OB/LW), and whether anything interestingly new came out of an independent attempt.
Steve and I (and, briefly but helpfully, Liron Shapira) took our swing at Newcomb. We wrote a great mass of notes that have been sitting on our hard drives, but hadn’t stitched them together into a single document. I’d like to attempt a Less Wrong sequence on that subject now. Most of this content is stuff that Eliezer, Nesov, and/or Dai developed independently and have been referring to in their posts, but I’ll try to present it more fully and clearly. I learned a bit of this from Eliezer/Nesov/Dai’s recent posts.
Here’s the outline, to be followed up with slower, clearer blog posts if all goes well:
0. Prelude: “Should” depends on counterfactuals. Newcomb's problem -- the problem of what Joe "should" do, to earn most money -- is the problem of which type of counterfactuals best cash out the question "Should Joe take one box or two?". Disagreement about Newcomb's problem is disagreement about what sort of counterfactuals we should consider, when we try to figure out what action Joe should take.
1. My goal in this sequence is to reduce “should” as thoroughly as I can. More specifically, I’ll make an (incomplete, but still useful) attempt to:
- Make it even more clear that our naive conceptions of “could” and “should” are conceptual inventions, and are not Physically Irreducible Existent Things. (Written here.)
- Consider why one might design an agent that uses concepts like “could” and “should” (hereafter a “Could/Should Agent”, or “CSA”), rather than designing an agent that acts in some other way. Consider what specific concepts of “could” and “should” are what specific kinds of useful. (This is meant as a more thorough investigation of the issues treated by Eliezer in “Possibility and Couldness”.)
- Consider why evolution ended up creating us as approximate CSAs. Consider what kinds of CSAs are likely to be how common across the multiverse.
2. A non-vicious regress. Suppose we’re designing Joe, and we want to maximize his expected winnings. What notion of “should” should we design Joe to use? There’s a regress here, in that creator-agents with different starting decision theories will design agents that have different starting decision theories. But it is a non-vicious regress. We can gain understanding by making this regress explicit, and asking under what circumstances agents with decision theory X will design future agents with decision theory Y, for different values of X and Y.
3a. When will a CDT-er build agents that use “could” and “should”? Suppose again that you’re designing Joe, and that Joe will go out in a world and win utilons on your behalf. What kind of Joe-design will maximize your expected utilons?
If we assume nothing about Joe’s world, we might find that your best option was to design Joe to act as a bundle of wires which happens to have advantageous physical effects, and which doesn’t act like an agent at all.
But suppose Joe’s world has the following handy property: suppose Joe’s actions have effects, and Joe’s “policy”, or the actions he “would have taken” in response to alternative inputs also have effects, but the details of Joe’s internal wiring doesn’t otherwise matter. (I'll call this the "policy-equivalence assumption"). Since Joe’s wiring doesn’t matter, you can, without penalty, insert whatever computation you like into Joe’s insides. And so, if you yourself can think through what action Joe “should” take, you can build wiring that sits inside Joe, carries out the same computation you would have used to figure out what action Joe “should” take, and then prompts that action.
Joe then inherits his counterfactuals from you: Joe’s model of what “would” happen “if he acts on policy X” is your model of what “would” happen if you design an agent, Joe, who acts according to policy X. The result is “act according to the policy my creator would have chosen” decision theory, now-A.K.A. “Updateless Decision Theory” (UDT). UDT one-boxes on Newcomb’s problem and pays the $100 in the counterfactual mugging problem.
3b. But it is only when computational limitations are thrown in that designing Joe to be a CSA leaves you better off than designing Joe to be your top-pick hard-coded policy. So, to understand where CSAs really come from, we’ll need eventually to consider how agents can use limited computation.
3c. When will a UDT-er build agents that use “could” and “should”? The answer is similar to that for a CDT-er.
3d. CSAs are only useful in a limited domain. In our derivations above, CSAs' usefulness depends on the policy-equivalence assumption. Therefore, if agents’ computation has important effects apart from its effects on the agents’ actions, the creator agent may be ill-advised to create any sort of CSA.* For example, if the heat produced by agents’ physical computation has effects that are as significant as the agent’s “chosen actions”, CSAs may not be useful. This limitation suggests that CSAs may not be useful in a post-singularity world, since in such a world matter may be organized to optimize for computation in a manner far closer to physical efficiency limits, and so the physical side-effects of computation may have more relative significance compared to the computation’s output.
4. What kinds of CSAs make sense? More specifically, what kinds of counterfactual “coulds” make sense as a basis for a CSA?
In part 4, we noted that when Joe’s policy is all that matters, you can stick your “What policy should Joe have?” computer inside Joe, without disrupting Joe’s payoffs. Thus, you can build Joe to be a “carry out the policy my creator would think best” CSA.
It turns out this trick can be extended.
Suppose you aren't a CDT-er. Suppose you are more like one of Eliezer's "timeless" agents. When you think about what you “could” and “should” do, you do your counterfactuals, not over what you alone will do, but over what you and a whole set of other agents “running the same algorithm you are running” will simultaneously do. For example, you may (in your model) be choosing what algorithm you and Clippy will both send into a one-shot prisoner’s dilemma.
Much as was the case with CDT-ers, so long as your utility estimate depends only on the algorithm’s outputs and not its details you can choose the algorithm you’re creating to be an “updateless”, “act according to the policy your creator would have chosen” CSA.
5. Which types of CSAs will create which other types of CSAs under what circumstances? I go through the list above.
6. A partial list of remaining problems, and of threads that may be useful to pull on.
6.a. Why design CSAs at all, rather than look-up tables or non-agent-like jumbles of wires? Computational limitations are part of the answer: if I design a CSA to play chess with me, it knows what move it has to respond to, and so can focus its computation on that specific situation. Does CSAs’ usefulness in focussing computation shed light on what type of CSAs to design?
6.b. More generally, how did evolution come to build us humans as approximate CSAs? And what kinds of decision theory should other agent-design processes, in other parts of the multiverse, be expected to create?
6.c. What kind of a CSA are you? What are you really asking, when you ask what you “should” do in Newcomb’s problem? What algorithm do you actually run, and want to run, there?
6.d. Two-player games: avoiding paradoxes and infinite types. I used a simplification above: I assumed that agents took in inputs from a finite list, and produced outputs from a finite list. This simplification does not allow for two general agents to, say, play one another in prisoner’s dilemma while seeing one another’s policy. If I can choose any policy that is a function from the set of you policy options to {C, D}, and you can choose any policy that is a function from the set of my policy options to {C, D}, each of us must have more policy-options than the other.
Some other formalism is needed for two-player games. (Eliezer lists this problem, and the entangled problem 6.e, in his Timeless decision theory: problems I can’t solve.)
6.e. What do real agents do in the situations Eliezer has been calling “problems of logical priority before time”? Also, what are the natural alternative decision theories to use for such problems, and is there one which is so much more natural than others that we might expect it to populate the universe, just as Eliezer hopes his “timeless decision theory” might accurately describe the bulk of decision agents in the universe?
Note that this question is related to, but harder than, the more limited question in 7.b. It is harder because we are now asking our CSAs to produce actions/outputs in more complicated situations.
6.f. Technical machinery for dealing with timeless decision theories. [Steve Rayhawk added this item.] As noted in 5 above, and as Eliezer noted in Ingredients of Timeless Decision Theory, we may wish to use a decision theory where we are “choosing” the answer to a particular math problem, or the output of a particular algorithm. Since the output of an algorithm is a logical question, this requires reasoning under uncertainty about the answers to logical questions. Setting this up usefully, without paradoxes or inconsistencies, requires some work. Steve has a gimmick for treating part of this problem. (The gimmick starts from the hierarchical Bayesian modeling idea of a hyperprior, used in its most general form: a prior belief about conditional probability tables for other variables.)
*More precisely: CSAs’ usefulness breaks down if the creator’s world-model includes important effects from its choice of which agent it creates, apart from the effects of that agent’s policy.
I assumed as much, but my problem with this reasoning starts here:
Normally, I'd agree, but as I said, Eliezer_Yudkowsky claims that a pebble contains the laws of physics, which are nothing but a network of counterfactuals. So there necessarily is an isomorphism between a planet and "multiple possible consequences".
This is why I say there must be a stronger sense in which you mean that the agent has computations that can be interpreted as evaluating multiple choices/consequences, because all of the universe is doing a sort of efficient version of that. And I don't yet know what this stronger sense is.