Decision theory: An outline of some upcoming posts

AnnaSalamon

Decision theory: An outline of some upcoming posts — LessWrong

31 Decision theory: An outline of some upcoming posts

25th Aug 2009

7 min read

31

Last August or so, Eliezer asked Steve Rayhawk and myself to attempt to solve Newcomb’s problem together. This project served a couple of purposes:
a. Get an indication as to our FAI research abilities.
b. Train our reduction-muscles.
c. Check whether Eliezer’s (unseen by us) timeless decision theory is a point that outside folks tend to arrive at independently (at least if starting from the rather substantial clues on OB/LW), and whether anything interestingly new came out of an independent attempt.

Steve and I (and, briefly but helpfully, Liron Shapira) took our swing at Newcomb. We wrote a great mass of notes that have been sitting on our hard drives, but hadn’t stitched them together into a single document. I’d like to attempt a Less Wrong sequence on that subject now. Most of this content is stuff that Eliezer, Nesov, and/or Dai developed independently and have been referring to in their posts, but I’ll try to present it more fully and clearly. I learned a bit of this from Eliezer/Nesov/Dai’s recent posts.

Here’s the outline, to be followed up with slower, clearer blog posts if all goes well:

0. Prelude: “Should” depends on counterfactuals. Newcomb's problem -- the problem of what Joe "should" do, to earn most money -- is the problem of which type of counterfactuals best cash out the question "Should Joe take one box or two?". Disagreement about Newcomb's problem is disagreement about what sort of counterfactuals we should consider, when we try to figure out what action Joe should take.

1. My goal in this sequence is to reduce “should” as thoroughly as I can. More specifically, I’ll make an (incomplete, but still useful) attempt to:

Make it even more clear that our naive conceptions of “could” and “should” are conceptual inventions, and are not Physically Irreducible Existent Things. (Written here.)
Consider why one might design an agent that uses concepts like “could” and “should” (hereafter a “Could/Should Agent”, or “CSA”), rather than designing an agent that acts in some other way. Consider what specific concepts of “could” and “should” are what specific kinds of useful. (This is meant as a more thorough investigation of the issues treated by Eliezer in “Possibility and Couldness”.)
Consider why evolution ended up creating us as approximate CSAs. Consider what kinds of CSAs are likely to be how common across the multiverse.

2. A non-vicious regress. Suppose we’re designing Joe, and we want to maximize his expected winnings. What notion of “should” should we design Joe to use? There’s a regress here, in that creator-agents with different starting decision theories will design agents that have different starting decision theories. But it is a non-vicious regress. We can gain understanding by making this regress explicit, and asking under what circumstances agents with decision theory X will design future agents with decision theory Y, for different values of X and Y.

3a. When will a CDT-er build agents that use “could” and “should”? Suppose again that you’re designing Joe, and that Joe will go out in a world and win utilons on your behalf. What kind of Joe-design will maximize your expected utilons?

If we assume nothing about Joe’s world, we might find that your best option was to design Joe to act as a bundle of wires which happens to have advantageous physical effects, and which doesn’t act like an agent at all.

But suppose Joe’s world has the following handy property: suppose Joe’s actions have effects, and Joe’s “policy”, or the actions he “would have taken” in response to alternative inputs also have effects, but the details of Joe’s internal wiring doesn’t otherwise matter. (I'll call this the "policy-equivalence assumption"). Since Joe’s wiring doesn’t matter, you can, without penalty, insert whatever computation you like into Joe’s insides. And so, if you yourself can think through what action Joe “should” take, you can build wiring that sits inside Joe, carries out the same computation you would have used to figure out what action Joe “should” take, and then prompts that action.

Joe then inherits his counterfactuals from you: Joe’s model of what “would” happen “if he acts on policy X” is your model of what “would” happen if you design an agent, Joe, who acts according to policy X. The result is “act according to the policy my creator would have chosen” decision theory, now-A.K.A. “Updateless Decision Theory” (UDT). UDT one-boxes on Newcomb’s problem and pays the $100 in the counterfactual mugging problem.

3b. But it is only when computational limitations are thrown in that designing Joe to be a CSA leaves you better off than designing Joe to be your top-pick hard-coded policy. So, to understand where CSAs really come from, we’ll need eventually to consider how agents can use limited computation.

3c. When will a UDT-er build agents that use “could” and “should”? The answer is similar to that for a CDT-er.

3d. CSAs are only useful in a limited domain. In our derivations above, CSAs' usefulness depends on the policy-equivalence assumption. Therefore, if agents’ computation has important effects apart from its effects on the agents’ actions, the creator agent may be ill-advised to create any sort of CSA.* For example, if the heat produced by agents’ physical computation has effects that are as significant as the agent’s “chosen actions”, CSAs may not be useful. This limitation suggests that CSAs may not be useful in a post-singularity world, since in such a world matter may be organized to optimize for computation in a manner far closer to physical efficiency limits, and so the physical side-effects of computation may have more relative significance compared to the computation’s output.

4. What kinds of CSAs make sense? More specifically, what kinds of counterfactual “coulds” make sense as a basis for a CSA?

In part 4, we noted that when Joe’s policy is all that matters, you can stick your “What policy should Joe have?” computer inside Joe, without disrupting Joe’s payoffs. Thus, you can build Joe to be a “carry out the policy my creator would think best” CSA.

It turns out this trick can be extended.

Suppose you aren't a CDT-er. Suppose you are more like one of Eliezer's "timeless" agents. When you think about what you “could” and “should” do, you do your counterfactuals, not over what you alone will do, but over what you and a whole set of other agents “running the same algorithm you are running” will simultaneously do. For example, you may (in your model) be choosing what algorithm you and Clippy will both send into a one-shot prisoner’s dilemma.

Much as was the case with CDT-ers, so long as your utility estimate depends only on the algorithm’s outputs and not its details you can choose the algorithm you’re creating to be an “updateless”, “act according to the policy your creator would have chosen” CSA.

5. Which types of CSAs will create which other types of CSAs under what circumstances? I go through the list above.

6. A partial list of remaining problems, and of threads that may be useful to pull on.

6.a. Why design CSAs at all, rather than look-up tables or non-agent-like jumbles of wires? Computational limitations are part of the answer: if I design a CSA to play chess with me, it knows what move it has to respond to, and so can focus its computation on that specific situation. Does CSAs’ usefulness in focussing computation shed light on what type of CSAs to design?

6.b. More generally, how did evolution come to build us humans as approximate CSAs? And what kinds of decision theory should other agent-design processes, in other parts of the multiverse, be expected to create?

6.c. What kind of a CSA are you? What are you really asking, when you ask what you “should” do in Newcomb’s problem? What algorithm do you actually run, and want to run, there?

6.d. Two-player games: avoiding paradoxes and infinite types. I used a simplification above: I assumed that agents took in inputs from a finite list, and produced outputs from a finite list. This simplification does not allow for two general agents to, say, play one another in prisoner’s dilemma while seeing one another’s policy. If I can choose any policy that is a function from the set of you policy options to {C, D}, and you can choose any policy that is a function from the set of my policy options to {C, D}, each of us must have more policy-options than the other.

Some other formalism is needed for two-player games. (Eliezer lists this problem, and the entangled problem 6.e, in his Timeless decision theory: problems I can’t solve.)

6.e. What do real agents do in the situations Eliezer has been calling “problems of logical priority before time”? Also, what are the natural alternative decision theories to use for such problems, and is there one which is so much more natural than others that we might expect it to populate the universe, just as Eliezer hopes his “timeless decision theory” might accurately describe the bulk of decision agents in the universe?

Note that this question is related to, but harder than, the more limited question in 7.b. It is harder because we are now asking our CSAs to produce actions/outputs in more complicated situations.

6.f. Technical machinery for dealing with timeless decision theories. [Steve Rayhawk added this item.] As noted in 5 above, and as Eliezer noted in Ingredients of Timeless Decision Theory, we may wish to use a decision theory where we are “choosing” the answer to a particular math problem, or the output of a particular algorithm. Since the output of an algorithm is a logical question, this requires reasoning under uncertainty about the answers to logical questions. Setting this up usefully, without paradoxes or inconsistencies, requires some work. Steve has a gimmick for treating part of this problem. (The gimmick starts from the hierarchical Bayesian modeling idea of a hyperprior, used in its most general form: a prior belief about conditional probability tables for other variables.)

*More precisely: CSAs’ usefulness breaks down if the creator’s world-model includes important effects from its choice of which agent it creates, apart from the effects of that agent’s policy.

Decision theory

Personal Blog

31

New Comment

31 comments, sorted by

top scoring

Click to highlight new comments since: Today at 2:44 AM

[-]DS361817y60

Does this greater detail mean that we will see some math and some worked out problems? Are these results ever going to be published in a journal, or anywhere that is peer-reviewed?

[-]CarlShulman17y40

Are these results ever going to be published in a journal, or anywhere that is peer-reviewed?

Yes.

[-]Mike Bishop17y40

What journals might publish this stuff? I'm curious to peek at what else they are publishing.

[-]Mitchell_Porter17y40

If this were a discussion regarding, say, "artificial geologists", and the question was "Why do artificial geologists usually have a concept of 'rock'?", it would not be regarded as very mysterious. A geologist needs the idea of a rock because rocks exist, and are basic to their domain of expertise. The general principle might be, "if it exists and is relevant to the agent, the agent will have a representation of it".

I take it that the major puzzle here is that counterfactuals are situations which do not exist, and yet CSAs contain representations of them. That's "could" (and "would" is a form of "could"); "should" seems less of a problem, since while the counterfactuals don't exist, the representations of them do, and so a decision process in which such representations contribute to the decision does not in itself involve anything like nonexistent entities having a causal effect on existent entities.

So the basic problem remains: Why do CSAs (a felicitous naming, by the way) contain representations of things which do not exist?

I would think the basic answer is that reality and possibility are combinatorial and lawful in the same way. The situations which actually occur, and the situations which could have occurred, are made of the same sort of entities (combinatorial in the same way), and the way to reason validly about their properties is also the same (lawful in the same way). The difference between a representation of an existing situation and a representation of a merely possible situation is not intrinsically fundamental - it is contingent on external circumstances, namely, what actually happens in the world. As a matter of computer science or cognitive science, there is no significant formal difference between isa(president,obama) and isa(president,mccain), even though only one of them represents a reality.

To sum up, I think that the answers to all your other questions will be technical unfoldings of the original insight that computationally, there is no significant difference between a representation of a counterfactual and a representation of a reality.

[-]Vladimir_Nesov17y30

An agent must come to a decision about a specific action fast enough to actually make that action. This runtime limitation doesn't seem to be so different from any other influence of the choice of the algorithm on the outcome, and it also doesn't seem in any way simple, so granting that no other limitation of the algorithm is present doesn't seem to significantly simplify the problem. Algorithm is part of policy, if not all of it.

An agent can be treated as generalization of action, and the algorithm part as a notion of externalized computation. Using agents corresponds to distributing computation of decisions over time and space, allowing the actions (or possible actions) to finish up the computation.

Why design CSAs at all, rather than look-up tables or non-agent-like jumbles of wires?

I don't see a salient distinction. Agent is one kind of jumble of wires. It is a highly compressed representation of what is denotationally equivalent to a look-up table. Agent with a nice decision theory is presumably a theory of how to do general-purpose design of good jumbles of wires.

[-]Chris_Leong8y20

I started reading through this, but sadly it seems that only the first three posts ended up being written.

[-]SilasBarta17y20

Consider why one might design an agent that uses concepts like “could” and “should” (hereafter a “Could/Should Agent”, or “CSA”), rather than designing an agent that acts in some other way. Consider what specific concepts of “could” and “should” are what specific kinds of useful. (This is meant as a more thorough investigation of the issues treated by Eliezer in “Possibility and Couldness”.)

I'm having some trouble with this part, because I could imagine modeling any object -- agent or otherwise -- as embodying the concepts of "could" and "should" without significant penalty to model complexity. (I thought I would change my mind after reading Possibility and Couldness but I didn't.)

For example: A planet embodies that it "should" alter its velocity and position as per the laws of motion and gravitation. Because you can (Eliezer_Yudkowsky claims) find the laws of physics inside a single pebble, the planet doesn't simply follow this regularity, but also embodies the "shouldness".

And just the same, it embodies "couldness" in terms of where it "could" go, if only other bodies had different positions and masses, etc.

So, I'm confused: since I can reframe anything as embodying couldness and shouldness, what would it even mean for an agent not to be constructed this way? I suspect there's an additional hidden assumption in here about additional requirements for something to meet "shouldness" and "couldness".

ETA: I should note here that Vladimir_Nesov already made the largely similar point that, despite AnnaSalamon's assumption of a distinction between could/should agents and "non-agent" jumbles of wires, there doesn't seem to actually be a salient distinction. Whatever AnnaSalamon believes it is, I'd like to know.

[-]Wei Dai17y30

I'm not sure how Anna Salamon would distinguish between a "could/should" agent and non-agent, but here's my definition. An agent is an algorithm that given an input, evaluates multiple possible outputs (and for a consequentialist agent specifically, their predicted consequences), then picks the one that best satisfies its preferences to be the actual output. So,

"could" is the set of actions that you consider during the course of your decision computation
"should" is preferences that you use to select between multiple could's

To categorize something as an agent, you need to look at its internal dynamics. A planet is not an agent because it's not doing any computation that could be interpreted as evaluating multiple possible choices, and it's certainly not predicting their consequences.

[-]Vladimir_Nesov17y30

But people are made out of meat! How can that be an agent? I remain sceptical of ability of a theory to interpret humans as agents with preference if the same theory is unable to interpret a tree or a rock the same way, perhaps with ridiculous preference, but preference nonetheless.

[-]thomblake17y20

in case it's not been linked to here: They're made out of meat!

[-]Vladimir_Nesov17y10

That was the reference. I'll also add a link to this video version.

[-]Eliezer Yudkowsky17y20

http://lesswrong.com/lw/tx/optimization/

http://lesswrong.com/lw/va/measuring_optimization_power/

[-]Vladimir_Nesov17y10

Well, one thing is power, another preference. The whole point of FAI is that there is not enough power in humans, while preference should be preserved.

It may be easier to make out what people's preference is than what a tree's preference is, but for example consider the situation where a person just died and is never to be heard from again, where they can't possibly make an impact on the world anymore (let's say it's 3000BC to exclude medical miracles). This person is extensionally no different from a tree, the difference lies primarily in the internal structure. These systems have the same power to optimize in the given situations, but they hardly have the same preference.

You could say that there are counterfactuals where the person recovers and goes on optimizing, but there are also counterfactuals that make the tree turn into an optimizer. There is in some sense a lot less situations where a tree turns into an optimizer than there are situations in which a dead person turns into an optimizer, but similarly there is a lot more situations in which a living person or an AGI operate as optimizers.

Where do you draw the line? If a theory does draw this line, the position of this line should be rigorously explained, not assumed on anthropomorphic scale.

[-]Wei Dai17y10

The "meat" is clearly implementing a computation of the type I described, whereas a tree or rock isn't. Do you dispute that?

A person who has died is no longer running such a computation, but until his brain decays, the agent-algorithm that he was running before he died can theoretically still be retrieved from his brain.

Your point seems to be that part of FAI theory should be a general and rigorous theory of how to extract preferences from any given object. Only then could we have sufficient theoretical support for any specific procedures for extracting preferences from human beings.

You may be right (I'm not sure) but I think that's a separate question from "why one might design [a could/should] agent", which is what started this thread. For that, the informal definition of "agent" that I gave seems to be sufficient, at least to understand the question.

[-]Vladimir_Nesov17y10

The "meat" is clearly implementing a computation of the type I described, whereas a tree or rock isn't. Do you dispute that?

I not so much dispute that as don't know of a way to make this judgment precise.

Your point seems to be that part of FAI theory should be a general and rigorous theory of how to extract preferences from any given object. Only then could we have sufficient theoretical support for any specific procedures for extracting preferences from human beings.

Right, although I'm not sure that "objects" are the right scope of such theory. I suspect that you also need enough subjective specification of preference to initiate the process of interpretation (preference-extraction). This will make preference of rocks arbitrary, because the process of their interpretation can start in too many arbitrary ways and won't converge to the same result from different starting points. At the same time, the structure of humans possibly creates a strong attractor, so that you have enough freedom in choosing the initial interpretation to specify something manually, while knowing that the end result depends very little on the initial specification.

I think that's a separate question from "why one might design [a could/should] agent", which is what started this thread. For that, the informal definition of "agent" that I gave seems to be sufficient, at least to understand the question.

On the level of informal understanding, of course. When you classify systems on agents and non-agents informally, you are using your own brain to interpret the system. This is not strong enough mechanism to extract preference, while a mechanism that can extract preference presumably would be able to see agents in configurations that people can't interpret as agents, and what those mechanisms can see as agents is a more rigorous definition of what an agent is, hence my remark.

[-]thomblake17y00

The "meat" is clearly implementing a computation of the type I described, whereas a tree or rock isn't. Do you dispute that?

Many would dispute that, possibly including Luciano Floridi. A tree or even a rock engages in information processing - it exchanges heat, electrons,and such with its surroundings, for starters. And there is almost certainly a decompression you can run on some of the information to fit whatever sort of pattern you're looking for.

[-]SilasBarta17y20

And there is almost certainly a decompression you can run on some of the information to fit whatever sort of pattern you're looking for.

I've explained before why this reasoning is misguided: to get arbitrary desired information processing out of random processes, you have to apply an ever-expanding interpretation, meaning that any model that calls e.g. a rock a computer is strictly longer than a model that doesn't because the former would have to include all of the latter, plus random data.

So a rock is not a general use computer (though it can be used to compute a result, if the computation you want to perform happens to be isomorphic to whatever e.g. heat transfer is going on right now).

Now, with that in mind, I was among those who claimed that rocks are agents as defined by AnnaSalamon et. al, so how do I reconcile this with the claim that rocks aren't computers?

Well, it's like this: An agent, as defined here, has internal dynamics that could, in principle, be understood as a network of counterfactuals and preferences. A computer, OTOH, does in fact do the work of altering your beliefs about an arbitrary computation. (Generally, that just means concentrating your probability distribution onto the right answer, when before you just figured it was within some range.)

And since Eliezer_Yudkowsky claims that even a pebble embodies the laws of physics, which are nothing but a causal network containing counterfactuals and a species of preference (like energy minimization), that means the term "agent" is carving out a much huger chunk of conceptspace than I think AnnaSalamon et al intended. Which is what makes it hard for me to understand what the agent concept is supposed to be distinguished from.

[-]SilasBarta17y00

Okay, come on guys, give me a break here; I think this post merits an explanation of where I erred rather than (or at least on top of) a downmod. Sure, I might have said something stupid, but I clearly laid out my reasoning about an important distinction that is being made. Help me out here.

[-]SilasBarta17y10

I assumed as much, but my problem with this reasoning starts here:

To categorize something as an agent, you need to look at its internal dynamics. A planet is not an agent because it's not doing any computation that could be interpreted as evaluating multiple possible choices, and it's certainly not predicting their consequences.

Normally, I'd agree, but as I said, Eliezer_Yudkowsky claims that a pebble contains the laws of physics, which are nothing but a network of counterfactuals. So there necessarily is an isomorphism between a planet and "multiple possible consequences".

This is why I say there must be a stronger sense in which you mean that the agent has computations that can be interpreted as evaluating multiple choices/consequences, because all of the universe is doing a sort of efficient version of that. And I don't yet know what this stronger sense is.

[-]Vladimir_Nesov17y00

Where planet actually goes has nothing to do with where it should go. Shouldness is about preference, and you said nothing of preference in your example. If the planet is on collision course with Earth, I say that it should turn the other way (and it could if an appropriate system was placed in interaction with it).

[-]SilasBarta17y10

And that would be shouldness with respect to you, not the planet. I submit that you're making the mind-projection fallacy here.

In the Eliezer_Yudkowsky article "Possibility and Couldness", it (or some other article in the series) identifies "shouldness" as the algorithm's internal recognition of a state it ranks higher in wanting to bring about. So I can in fact map that concept onto the planet, in that it identifies and acts on the preference for moving as per the laws of motion and gravitation.

[-]Vladimir_Nesov17y00

This doesn't capture the concept of an error. Preference should also be seen as an abstract mathematical object which the algorithm doesn't necessarily maximize, but tries to set as high as it can. Of course, if I talk of shouldness, I must refer to particular preference, in this case I referred to mine. Notice that if I can't move the planet away, it in fact collides with Earth, but it doesn't mean that it should collide with Earth according to my preference. Likewise, you can't assert that according to the planet's preference, it should collide with Earth merely from the fact that it does: maybe the planet wants to be a whale instead, but can't.

[-]SilasBarta17y20

Preference should also be seen as an abstract mathematical object which the algorithm doesn't necessarily maximize, but tries to set as high as it can.

Right, it maximizes according to constrains. And?

Notice that if I can't move the planet away, it in fact collides with Earth, but it doesn't mean that it should collide with Earth according to my preference

Right, your preference is different from the planet's. That was your error in your last response.

Likewise, you can't assert that according to the planet's preference, it should collide with Earth merely from the fact that it does: maybe the planet wants to be a whale instead, but can't.

The planet doesn't want to be a whale; that wouldn't minimize its Gibbs Free Energy in its local domain of attraction.

[-]Vladimir_Nesov17y10

Notice that if I can't move the planet away, it in fact collides with Earth, but it doesn't mean that it should collide with Earth according to my preference

Right, your preference is different from the planet's. That was your error in your last response.

My preference is over everything, the planet included. By saying "the planet shouldn't collide with Earth" I mean that I should make the planet not collide with Earth, I'm not talking about the preference of the planet in this sentence, I only talk about my preference.

Likewise, you can't assert that according to the planet's preference, it should collide with Earth merely from the fact that it does: maybe the planet wants to be a whale instead, but can't.

The planet doesn't want to be a whale; that wouldn't minimize its Gibbs Free Energy in its local domain of attraction.

That planet wants to be a whale is a hypothetical. Accept it in reading what depends on accepting it. If the planet does in fact wants to be a whale, it can still be unable to make that happen, and you may still observe it moving along its orbit. You can't assert that it doesn't want to be a whale from extensionally observing how it actually moves.

You are confusing the variational formulation of laws of physics with preference of optimization processes (probably because in both cases, you maximize/minimize something). Optimization process actually optimizes stuff over time (at least on simpler stages, e.g. humans), while variational form of the laws of physics just says that the true solution (that describes what will actually happen) can be represented as the maximum/minimum of a certain function, given the constraints. This is just a convenient form for finding approximate solutions and for understanding the system's properties.

The same factual outcome can be written as the maximum of many different functions under different constraints. One of the functions for which you can seek an extremum given constraints describes the behavior of the system on the level of physics (for example, using principle of least action; I forgot my physics, but it doesn't look like Gibbs free energy applies to motion of planets). A completely different function for which you can seek an extremem given constraints describes its behavior on the level of preference -- that's utility. Both these accounts give the same solution stating what will actually happen, but the functions are different.

"Shouldness" refers to a particular very specific way of presenting the system's behavior, and it's not free energy. Notice that you can describe AI's or man's behavior with physical variational principles as well, but that will have nothing to do with their preference.

[-]abramdemski17y10

"Shouldness" refers to a particular very specific way of presenting the system's >behavior, and it's not free energy. Notice that you can describe AI's or man's behavior >with physical variational principles as well, but that will have nothing to do with their >preference.

It seems to me that what SilasBarta is asking for here is a definition of shouldness such that the above statement holds. Why is it invalid to think that the system "wants" its physics? All you are indicating is that such is not what's intended (which I'm sure SilasBarta knows)...

[-]Xplat17y20

As far as variational principles go, one difference is that a physical system displays no preference among the different local extrema. (IIRC you can even come up with models where the same system will minimize (an) action for some initial conditions and maximize it for others.) This makes a Lagrangian-style physical system a pretty poor CSA even if you go out of your way to model it as one.

[-]SilasBarta17y00

CSAs can't escape local optima either ... unless you found your global optimum without telling us ;-)

[-]Vladimir_Nesov17y-20

Nothing singles out a particular variational formulation of physical laws as preference, among all the other equivalent formulations. Stating that the planet wants to minimize its action or whatever is as arbitrary as saying that it wants to be a whale. Silas Barta was asserting that "free energy" is the answer, which seems to be wrong on multiple accounts.

[-]SilasBarta17y20

Stating that the planet wants to minimize its action or whatever is as arbitrary as saying that it wants to be a whale. Silas Barta was asserting that "free energy" is the answer

No, I wasn't, but I couldn't even follow what your point was, once you started equating your own "shouldness" with the planet's shouldness, as if that implied some kind of contradiction if they're different. So, I didn't follow up.

The point was, if indeed we are all fully deterministic, and planets are fully deterministic, and planets embody the laws of physics, the concept of "shouldness" must be equally applicable in both cases. (More generally, I can't distinguish "agent" type algorithms from "non-agent" type algorithms, so I don't know what the alternative is.)

You "could jump off that cliff, if you wanted to." But as Eliezer_Yudkowsky notes in the link above, this statement is completely consistent with "It is physically impossible that you will jump off that cliff." Because the "causal forces within physics that are you" cannot reach that state.

And there's the kicker: that situation is no different from that of a planet: whatever it "wishes", it's physically impossible to do anything but follow the path dictated by physics.

My point about free energy was just to a) do a simple "reality check" (not the only check you can do) that would justify saying "the planet doesn't want to be a whale", and b) that every system will minimize its free energy with respect to a local domain of attraction. Just like how water will flow downhill spontaneously, but it won't jump out of a basin, just because that can get it even further downhill.

Now, in the sense that people can "want the impossible", then yes, I have no evidence that a planet doesn't want to be a whale. What I perhaps should have said is, a planet has not identified being a whale as the goal or subgoal it is in pursuit of. Even taking this reasoning to the extreme, the very first steps toward becoming a whale, would immediately hit the hard limits of free energy minimization, and so the planet could never even begin such a path -- not viewed as a single entity.

[-]Vladimir_Nesov17y00

Now, in the sense that people can "want the impossible", then yes, I have no evidence that a planet doesn't want to be a whale.

Yup, that's the case. This concept is meaningful because sometimes unexpected opportunities appear and the predictably impossible turns into an option. Or, more constructively, this concept is required to implement external "help" that is known in advance to be welcome.

[-]cousin_it17y10

Very interesting. By all means write it up in more detail.

Moderation Log