Notice that if I can't move the planet away, it in fact collides with Earth, but it doesn't mean that it should collide with Earth according to my preference
Right, your preference is different from the planet's. That was your error in your last response.
My preference is over everything, the planet included. By saying "the planet shouldn't collide with Earth" I mean that I should make the planet not collide with Earth, I'm not talking about the preference of the planet in this sentence, I only talk about my preference.
Likewise, you can't assert that according to the planet's preference, it should collide with Earth merely from the fact that it does: maybe the planet wants to be a whale instead, but can't.
The planet doesn't want to be a whale; that wouldn't minimize its Gibbs Free Energy in its local domain of attraction.
That planet wants to be a whale is a hypothetical. Accept it in reading what depends on accepting it. If the planet does in fact wants to be a whale, it can still be unable to make that happen, and you may still observe it moving along its orbit. You can't assert that it doesn't want to be a whale from extensionally observing how it actually moves.
You are confusing the variational formulation of laws of physics with preference of optimization processes (probably because in both cases, you maximize/minimize something). Optimization process actually optimizes stuff over time (at least on simpler stages, e.g. humans), while variational form of the laws of physics just says that the true solution (that describes what will actually happen) can be represented as the maximum/minimum of a certain function, given the constraints. This is just a convenient form for finding approximate solutions and for understanding the system's properties.
The same factual outcome can be written as the maximum of many different functions under different constraints. One of the functions for which you can seek an extremum given constraints describes the behavior of the system on the level of physics (for example, using principle of least action; I forgot my physics, but it doesn't look like Gibbs free energy applies to motion of planets). A completely different function for which you can seek an extremem given constraints describes its behavior on the level of preference -- that's utility. Both these accounts give the same solution stating what will actually happen, but the functions are different.
"Shouldness" refers to a particular very specific way of presenting the system's behavior, and it's not free energy. Notice that you can describe AI's or man's behavior with physical variational principles as well, but that will have nothing to do with their preference.
"Shouldness" refers to a particular very specific way of presenting the system's >behavior, and it's not free energy. Notice that you can describe AI's or man's behavior >with physical variational principles as well, but that will have nothing to do with their >preference.
It seems to me that what SilasBarta is asking for here is a definition of shouldness such that the above statement holds. Why is it invalid to think that the system "wants" its physics? All you are indicating is that such is not what's intended (which I'm sure SilasBarta knows)...
Last August or so, Eliezer asked Steve Rayhawk and myself to attempt to solve Newcomb’s problem together. This project served a couple of purposes:
a. Get an indication as to our FAI research abilities.
b. Train our reduction-muscles.
c. Check whether Eliezer’s (unseen by us) timeless decision theory is a point that outside folks tend to arrive at independently (at least if starting from the rather substantial clues on OB/LW), and whether anything interestingly new came out of an independent attempt.
Steve and I (and, briefly but helpfully, Liron Shapira) took our swing at Newcomb. We wrote a great mass of notes that have been sitting on our hard drives, but hadn’t stitched them together into a single document. I’d like to attempt a Less Wrong sequence on that subject now. Most of this content is stuff that Eliezer, Nesov, and/or Dai developed independently and have been referring to in their posts, but I’ll try to present it more fully and clearly. I learned a bit of this from Eliezer/Nesov/Dai’s recent posts.
Here’s the outline, to be followed up with slower, clearer blog posts if all goes well:
0. Prelude: “Should” depends on counterfactuals. Newcomb's problem -- the problem of what Joe "should" do, to earn most money -- is the problem of which type of counterfactuals best cash out the question "Should Joe take one box or two?". Disagreement about Newcomb's problem is disagreement about what sort of counterfactuals we should consider, when we try to figure out what action Joe should take.
1. My goal in this sequence is to reduce “should” as thoroughly as I can. More specifically, I’ll make an (incomplete, but still useful) attempt to:
2. A non-vicious regress. Suppose we’re designing Joe, and we want to maximize his expected winnings. What notion of “should” should we design Joe to use? There’s a regress here, in that creator-agents with different starting decision theories will design agents that have different starting decision theories. But it is a non-vicious regress. We can gain understanding by making this regress explicit, and asking under what circumstances agents with decision theory X will design future agents with decision theory Y, for different values of X and Y.
3a. When will a CDT-er build agents that use “could” and “should”? Suppose again that you’re designing Joe, and that Joe will go out in a world and win utilons on your behalf. What kind of Joe-design will maximize your expected utilons?
If we assume nothing about Joe’s world, we might find that your best option was to design Joe to act as a bundle of wires which happens to have advantageous physical effects, and which doesn’t act like an agent at all.
But suppose Joe’s world has the following handy property: suppose Joe’s actions have effects, and Joe’s “policy”, or the actions he “would have taken” in response to alternative inputs also have effects, but the details of Joe’s internal wiring doesn’t otherwise matter. (I'll call this the "policy-equivalence assumption"). Since Joe’s wiring doesn’t matter, you can, without penalty, insert whatever computation you like into Joe’s insides. And so, if you yourself can think through what action Joe “should” take, you can build wiring that sits inside Joe, carries out the same computation you would have used to figure out what action Joe “should” take, and then prompts that action.
Joe then inherits his counterfactuals from you: Joe’s model of what “would” happen “if he acts on policy X” is your model of what “would” happen if you design an agent, Joe, who acts according to policy X. The result is “act according to the policy my creator would have chosen” decision theory, now-A.K.A. “Updateless Decision Theory” (UDT). UDT one-boxes on Newcomb’s problem and pays the $100 in the counterfactual mugging problem.
3b. But it is only when computational limitations are thrown in that designing Joe to be a CSA leaves you better off than designing Joe to be your top-pick hard-coded policy. So, to understand where CSAs really come from, we’ll need eventually to consider how agents can use limited computation.
3c. When will a UDT-er build agents that use “could” and “should”? The answer is similar to that for a CDT-er.
3d. CSAs are only useful in a limited domain. In our derivations above, CSAs' usefulness depends on the policy-equivalence assumption. Therefore, if agents’ computation has important effects apart from its effects on the agents’ actions, the creator agent may be ill-advised to create any sort of CSA.* For example, if the heat produced by agents’ physical computation has effects that are as significant as the agent’s “chosen actions”, CSAs may not be useful. This limitation suggests that CSAs may not be useful in a post-singularity world, since in such a world matter may be organized to optimize for computation in a manner far closer to physical efficiency limits, and so the physical side-effects of computation may have more relative significance compared to the computation’s output.
4. What kinds of CSAs make sense? More specifically, what kinds of counterfactual “coulds” make sense as a basis for a CSA?
In part 4, we noted that when Joe’s policy is all that matters, you can stick your “What policy should Joe have?” computer inside Joe, without disrupting Joe’s payoffs. Thus, you can build Joe to be a “carry out the policy my creator would think best” CSA.
It turns out this trick can be extended.
Suppose you aren't a CDT-er. Suppose you are more like one of Eliezer's "timeless" agents. When you think about what you “could” and “should” do, you do your counterfactuals, not over what you alone will do, but over what you and a whole set of other agents “running the same algorithm you are running” will simultaneously do. For example, you may (in your model) be choosing what algorithm you and Clippy will both send into a one-shot prisoner’s dilemma.
Much as was the case with CDT-ers, so long as your utility estimate depends only on the algorithm’s outputs and not its details you can choose the algorithm you’re creating to be an “updateless”, “act according to the policy your creator would have chosen” CSA.
5. Which types of CSAs will create which other types of CSAs under what circumstances? I go through the list above.
6. A partial list of remaining problems, and of threads that may be useful to pull on.
6.a. Why design CSAs at all, rather than look-up tables or non-agent-like jumbles of wires? Computational limitations are part of the answer: if I design a CSA to play chess with me, it knows what move it has to respond to, and so can focus its computation on that specific situation. Does CSAs’ usefulness in focussing computation shed light on what type of CSAs to design?
6.b. More generally, how did evolution come to build us humans as approximate CSAs? And what kinds of decision theory should other agent-design processes, in other parts of the multiverse, be expected to create?
6.c. What kind of a CSA are you? What are you really asking, when you ask what you “should” do in Newcomb’s problem? What algorithm do you actually run, and want to run, there?
6.d. Two-player games: avoiding paradoxes and infinite types. I used a simplification above: I assumed that agents took in inputs from a finite list, and produced outputs from a finite list. This simplification does not allow for two general agents to, say, play one another in prisoner’s dilemma while seeing one another’s policy. If I can choose any policy that is a function from the set of you policy options to {C, D}, and you can choose any policy that is a function from the set of my policy options to {C, D}, each of us must have more policy-options than the other.
Some other formalism is needed for two-player games. (Eliezer lists this problem, and the entangled problem 6.e, in his Timeless decision theory: problems I can’t solve.)
6.e. What do real agents do in the situations Eliezer has been calling “problems of logical priority before time”? Also, what are the natural alternative decision theories to use for such problems, and is there one which is so much more natural than others that we might expect it to populate the universe, just as Eliezer hopes his “timeless decision theory” might accurately describe the bulk of decision agents in the universe?
Note that this question is related to, but harder than, the more limited question in 7.b. It is harder because we are now asking our CSAs to produce actions/outputs in more complicated situations.
6.f. Technical machinery for dealing with timeless decision theories. [Steve Rayhawk added this item.] As noted in 5 above, and as Eliezer noted in Ingredients of Timeless Decision Theory, we may wish to use a decision theory where we are “choosing” the answer to a particular math problem, or the output of a particular algorithm. Since the output of an algorithm is a logical question, this requires reasoning under uncertainty about the answers to logical questions. Setting this up usefully, without paradoxes or inconsistencies, requires some work. Steve has a gimmick for treating part of this problem. (The gimmick starts from the hierarchical Bayesian modeling idea of a hyperprior, used in its most general form: a prior belief about conditional probability tables for other variables.)
*More precisely: CSAs’ usefulness breaks down if the creator’s world-model includes important effects from its choice of which agent it creates, apart from the effects of that agent’s policy.