What a reduction of "could" could look like

cousin_it

84 What a reduction of "could" could look like

12th Aug 2010

2 min read

84

By requests from Blueberry and jimrandomh, here's an expanded repost of my comment which was itself a repost of my email sent to decision-theory-workshop.

(Wait, I gotta take a breath now.)

A note on credit: I can only claim priority for the specific formalization offered here, which builds on Vladimir Nesov's idea of "ambient control", which builds on Wei Dai's idea of UDT, which builds on Eliezer's idea of TDT. I really, really hope to not offend anyone.

(Whew!)

Imagine a purely deterministic world containing a purely deterministic agent. To make it more precise, agent() is a Python function that returns an integer encoding an action, and world() is a Python function that calls agent() and returns the resulting utility value. The source code of both world() and agent() is accessible to agent(), so there's absolutely no uncertainty involved anywhere. Now we want to write an implementation of agent() that would "force" world() to return as high a value as possible, for a variety of different worlds and without foreknowledge of what world() looks like. So this framing of decision theory makes a subprogram try to "control" the output of a bigger program it's embedded in.

For example, here's Newcomb's Problem:

def world():

  box1 = 1000

  box2 = (agent() == 2) ? 0 : 1000000

  return box2 + ((agent() == 2) ? box1 : 0)

A possible algorithm for agent() may go as follows. Look for machine-checkable mathematical proofs, up to a specified max length, of theorems of the form "agent()==A implies world()==U" for varying values of A and U. Then, after searching for some time, take the biggest found value of U and return the corresponding A. For example, in Newcomb's Problem above there are easy theorems, derivable even without looking at the source code of agent(), that agent()==2 implies world()==1000 and agent()==1 implies world()==1000000.

The reason this algorithm works is very weird, so you might want to read the following more than once. Even though most of the theorems proved by the agent are based on false premises (because it is obviously logically contradictory for agent() to return a value other than the one it actually returns), the one specific theorem that leads to maximum U must turn out to be correct, because the agent makes its premise true by outputting A. In other words, an agent implemented like that cannot derive a contradiction from the logically inconsistent premises it uses, because then it would "imagine" it could obtain arbitrarily high utility (a contradiction implies anything, including that), therefore the agent would output the corresponding action, which would prove the Peano axioms inconsistent or something.

To recap: the above describes a perfectly deterministic algorithm, implementable today in any ordinary programming language, that "inspects" an unfamiliar world(), "imagines" itself returning different answers, "chooses" the best one according to projected consequences, and cannot ever "notice" that the other "possible" choices are logically inconsistent with determinism. Even though the other choices are in fact inconsistent, and the agent has absolutely perfect "knowledge" of itself and the world, and as much CPU time as it wants. (All scare quotes are intentional.)

This is progress. We started out with deterministic programs and ended up with a workable concept of "could".

Hopefully, results in this vein may someday remove the need for separate theories of counterfactual reasoning based on modal logics or something. This particular result only demystifies counterfactuals about yourself, not counterfactuals in general: for example, if agent A tries to reason about agent B in the same way, it will fail miserably. But maybe the approach can be followed further.

Decision theoryReductionism

Frontpage

84

New Comment

Rendering 0/111 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 11:29 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

84 What a reduction of "could" could look like

by cousin_it

12th Aug 2010

2 min read

111

84

By requests from Blueberry and jimrandomh, here's an expanded repost of my comment which was itself a repost of my email sent to decision-theory-workshop.

(Wait, I gotta take a breath now.)

(Whew!)

For example, here's Newcomb's Problem:

def world():

  box1 = 1000

  box2 = (agent() == 2) ? 0 : 1000000

  return box2 + ((agent() == 2) ? box1 : 0)

This is progress. We started out with deterministic programs and ended up with a workable concept of "could".

Decision theoryReductionism

Frontpage

84

Mentioned in

71Another attempt to explain UDT

70A model of UDT with a halting oracle

48How some algorithms feel from inside

47Example decision theory problem: "Agent simulates predictor"

35Formulas of arithmetic that behave like decision agents

Load More (5/13)

New Comment

Rendering 0/111 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 11:29 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

More from cousin_it

Curated and popular this week

111Comments

111

Comment Permalink

Vladimir_Nesov16y60

Note that world() essentially doesn't talk about what can happen, instead it talks about how to compute utility, and computation of utility is a single fixed program without parameters (not even depending on the agent), that the agent "controls" from the inside.

To clarify, in the Newcomb example, the preference (world) program could be:

def world(): 
  box1 = 1000 
  box2 = (agent1() == 2) ? 0 : 1000000 
  return box2 + ((agent2() == 2) ? box1 : 0)

while the agent itself knows its code in the form agent3(). If it can prove both agent1() and agent2() to be equivalent to agent3(), it can decide in exactly the same way, but now the preference program doesn't contain the agent even once, there is no explicit dependency of the world program (utility of the outcome) on the agent's actions. Any dependency is logically inferred.

And we now have a prototype of the notion of preference: a fixed program that computes utility.

Perplexed16y00

And we now have a prototype of the notion of preference: a fixed program that computes utility.

One possible problem: There is a difference between a program and the function it computes. The notion of preference is perhaps captured best by a function, not a program. Many programs could be written, all computing the same function. This wouldn't matter much if we were only going to call or invoke the program, since all versions of the program compute the same result. But, this is not what cousin_it suggests here. He wants us to examine the source of ... (read more)

1cousin_it16y

Yep - thanks. This is an important idea that I didn't want to burden the post with. It would have brought the argument closer to UDT or ADT research, and I'd much prefer if you and Wei Dai wrote posts on these topics, not me. In matters of ethics and priority, one cannot be too careful. Besides, I want to read your posts because you have different perspectives from mine on these matters.

0[anonymous]16y

Yep, in my original email world() was called utility().

See in context