What a reduction of "could" could look like

cousin_it

84 What a reduction of "could" could look like

12th Aug 2010

2 min read

84

By requests from Blueberry and jimrandomh, here's an expanded repost of my comment which was itself a repost of my email sent to decision-theory-workshop.

(Wait, I gotta take a breath now.)

A note on credit: I can only claim priority for the specific formalization offered here, which builds on Vladimir Nesov's idea of "ambient control", which builds on Wei Dai's idea of UDT, which builds on Eliezer's idea of TDT. I really, really hope to not offend anyone.

(Whew!)

Imagine a purely deterministic world containing a purely deterministic agent. To make it more precise, agent() is a Python function that returns an integer encoding an action, and world() is a Python function that calls agent() and returns the resulting utility value. The source code of both world() and agent() is accessible to agent(), so there's absolutely no uncertainty involved anywhere. Now we want to write an implementation of agent() that would "force" world() to return as high a value as possible, for a variety of different worlds and without foreknowledge of what world() looks like. So this framing of decision theory makes a subprogram try to "control" the output of a bigger program it's embedded in.

For example, here's Newcomb's Problem:

def world():

  box1 = 1000

  box2 = (agent() == 2) ? 0 : 1000000

  return box2 + ((agent() == 2) ? box1 : 0)

A possible algorithm for agent() may go as follows. Look for machine-checkable mathematical proofs, up to a specified max length, of theorems of the form "agent()==A implies world()==U" for varying values of A and U. Then, after searching for some time, take the biggest found value of U and return the corresponding A. For example, in Newcomb's Problem above there are easy theorems, derivable even without looking at the source code of agent(), that agent()==2 implies world()==1000 and agent()==1 implies world()==1000000.

The reason this algorithm works is very weird, so you might want to read the following more than once. Even though most of the theorems proved by the agent are based on false premises (because it is obviously logically contradictory for agent() to return a value other than the one it actually returns), the one specific theorem that leads to maximum U must turn out to be correct, because the agent makes its premise true by outputting A. In other words, an agent implemented like that cannot derive a contradiction from the logically inconsistent premises it uses, because then it would "imagine" it could obtain arbitrarily high utility (a contradiction implies anything, including that), therefore the agent would output the corresponding action, which would prove the Peano axioms inconsistent or something.

To recap: the above describes a perfectly deterministic algorithm, implementable today in any ordinary programming language, that "inspects" an unfamiliar world(), "imagines" itself returning different answers, "chooses" the best one according to projected consequences, and cannot ever "notice" that the other "possible" choices are logically inconsistent with determinism. Even though the other choices are in fact inconsistent, and the agent has absolutely perfect "knowledge" of itself and the world, and as much CPU time as it wants. (All scare quotes are intentional.)

This is progress. We started out with deterministic programs and ended up with a workable concept of "could".

Hopefully, results in this vein may someday remove the need for separate theories of counterfactual reasoning based on modal logics or something. This particular result only demystifies counterfactuals about yourself, not counterfactuals in general: for example, if agent A tries to reason about agent B in the same way, it will fail miserably. But maybe the approach can be followed further.

Decision theoryReductionism

Frontpage

84

New Comment

Rendering 0/111 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 1:47 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

84 What a reduction of "could" could look like

by cousin_it

12th Aug 2010

2 min read

111

84

By requests from Blueberry and jimrandomh, here's an expanded repost of my comment which was itself a repost of my email sent to decision-theory-workshop.

(Wait, I gotta take a breath now.)

(Whew!)

For example, here's Newcomb's Problem:

def world():

  box1 = 1000

  box2 = (agent() == 2) ? 0 : 1000000

  return box2 + ((agent() == 2) ? box1 : 0)

This is progress. We started out with deterministic programs and ended up with a workable concept of "could".

Decision theoryReductionism

Frontpage

84

Mentioned in

70Another attempt to explain UDT

70A model of UDT with a halting oracle

48How some algorithms feel from inside

47Example decision theory problem: "Agent simulates predictor"

35Formulas of arithmetic that behave like decision agents

Load More (5/13)

New Comment

Rendering 0/111 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 1:47 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

More from cousin_it

Curated and popular this week

111Comments

111

Comment Permalink

Benya16y30

I'm not sure if I understand torekp's comment right, but it seems to me that the issue is the difference between "physical" and "logical" counterfactuals in the following sense:

For concreteness and simplicity, let's suppose physics can be modeled by a function step(state) that takes a state of the world, and returns the state of the world one instant later. Then you can ask what would happen if the current state were slightly different ("suppose there were an apple here in the air at this point in time; then it would fall down"). Call this a "physical" counterfactual.

Before reading Eliezer's arguments, I used to think this could be used to model decision-making: Let stateA be what the state of the world will really be at the point I've made my decision. For every choice X I could make, let stateA_X be the same as stateA except that the future copy of me is replaced by a version that has decided to do X. Now it's well-defined what the future of every stateA_X looks like, and if I have a well-defined utility function, then it's well-defined which choice(s) this utility is maximized. Of course I cannot calculate this, but, I thought, the actual decision making process could be seen as an approximation of this.

Except that this would lead to classical decision/game theory, with its unattractive consequences -- the case where Omega has just made two identical physical copies of me, and has us play a true Prisoner's Dilemma, etc. Since I can prove that the other copy of me will do the same thing in every stateA_X (because I'm only replacing myself, not the other copy), it follows that I should defect, because it gives me the higher payoff no matter what the other copy of me does.

Thus "physical" counterfactuals do not make for an acceptable decision theory, and we're forced to look for some notion not of, "what would happen if the state of the world were different," but of something like, "what would happen if this Turing machine would return a different result?" That would be a "logical" counterfactual. Of course, it's not obvious what the hell this is supposed to mean, and if it can be made to mean anything coherent -- your post is an answer to that question.

torekp16y00

Benja's got it: I'm interested in physical counterfactuals. They are the type that is involved in the everyday notion of what a person "could" do.

As for decision theory, I think that the "logical" counterfactuals should supplement, not supplant, the physical counterfactuals.

See in context