What a reduction of "could" could look like

cousin_it

84 What a reduction of "could" could look like

12th Aug 2010

2 min read

84

By requests from Blueberry and jimrandomh, here's an expanded repost of my comment which was itself a repost of my email sent to decision-theory-workshop.

(Wait, I gotta take a breath now.)

A note on credit: I can only claim priority for the specific formalization offered here, which builds on Vladimir Nesov's idea of "ambient control", which builds on Wei Dai's idea of UDT, which builds on Eliezer's idea of TDT. I really, really hope to not offend anyone.

(Whew!)

Imagine a purely deterministic world containing a purely deterministic agent. To make it more precise, agent() is a Python function that returns an integer encoding an action, and world() is a Python function that calls agent() and returns the resulting utility value. The source code of both world() and agent() is accessible to agent(), so there's absolutely no uncertainty involved anywhere. Now we want to write an implementation of agent() that would "force" world() to return as high a value as possible, for a variety of different worlds and without foreknowledge of what world() looks like. So this framing of decision theory makes a subprogram try to "control" the output of a bigger program it's embedded in.

For example, here's Newcomb's Problem:

def world():

  box1 = 1000

  box2 = (agent() == 2) ? 0 : 1000000

  return box2 + ((agent() == 2) ? box1 : 0)

A possible algorithm for agent() may go as follows. Look for machine-checkable mathematical proofs, up to a specified max length, of theorems of the form "agent()==A implies world()==U" for varying values of A and U. Then, after searching for some time, take the biggest found value of U and return the corresponding A. For example, in Newcomb's Problem above there are easy theorems, derivable even without looking at the source code of agent(), that agent()==2 implies world()==1000 and agent()==1 implies world()==1000000.

The reason this algorithm works is very weird, so you might want to read the following more than once. Even though most of the theorems proved by the agent are based on false premises (because it is obviously logically contradictory for agent() to return a value other than the one it actually returns), the one specific theorem that leads to maximum U must turn out to be correct, because the agent makes its premise true by outputting A. In other words, an agent implemented like that cannot derive a contradiction from the logically inconsistent premises it uses, because then it would "imagine" it could obtain arbitrarily high utility (a contradiction implies anything, including that), therefore the agent would output the corresponding action, which would prove the Peano axioms inconsistent or something.

To recap: the above describes a perfectly deterministic algorithm, implementable today in any ordinary programming language, that "inspects" an unfamiliar world(), "imagines" itself returning different answers, "chooses" the best one according to projected consequences, and cannot ever "notice" that the other "possible" choices are logically inconsistent with determinism. Even though the other choices are in fact inconsistent, and the agent has absolutely perfect "knowledge" of itself and the world, and as much CPU time as it wants. (All scare quotes are intentional.)

This is progress. We started out with deterministic programs and ended up with a workable concept of "could".

Hopefully, results in this vein may someday remove the need for separate theories of counterfactual reasoning based on modal logics or something. This particular result only demystifies counterfactuals about yourself, not counterfactuals in general: for example, if agent A tries to reason about agent B in the same way, it will fail miserably. But maybe the approach can be followed further.

Decision theoryReductionism

Frontpage

84

New Comment

Rendering 0/111 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 3:40 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

84 What a reduction of "could" could look like

by cousin_it

12th Aug 2010

2 min read

111

84

By requests from Blueberry and jimrandomh, here's an expanded repost of my comment which was itself a repost of my email sent to decision-theory-workshop.

(Wait, I gotta take a breath now.)

(Whew!)

For example, here's Newcomb's Problem:

def world():

  box1 = 1000

  box2 = (agent() == 2) ? 0 : 1000000

  return box2 + ((agent() == 2) ? box1 : 0)

This is progress. We started out with deterministic programs and ended up with a workable concept of "could".

Decision theoryReductionism

Frontpage

84

Mentioned in

70Another attempt to explain UDT

70A model of UDT with a halting oracle

48How some algorithms feel from inside

47Example decision theory problem: "Agent simulates predictor"

35Formulas of arithmetic that behave like decision agents

Load More (5/13)

New Comment

Rendering 0/111 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 3:40 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

More from cousin_it

Curated and popular this week

111Comments

111

Comment Permalink

torekp16y10

"Logical" counterfactuals either mean those whose antecedents are logically impossible, or those whose consequents follow from the antecedents by logical necessity, or both. In your Newcomb's-problem-solving algorithm example, both features apply.

But there are many decisions (or sub-problems within decisions) where neither feature applies. Where the agent is a human being without access to its "source code", it commits no contradiction in supposing "if I did A... but if I did B ..." even though at most one of these lies in the future that causality is heading toward. Additionally, one may be unable to prove that the expected utility of A would be U1 and the expected utility of B would be U2, even though one may reasonably believe both of these. Even when we know most of the relevant physical laws, we lack knowledge of relevant initial or boundary conditions, and if we had those we'd still lack the time to do the calculations.

Our knowledge of physical counterfactuals isn't deductive, usually, but it's still knowledge. And we still need to know the physical consequences of various actions - so we'll have to go on using these counterfactuals. Of course, I just used another intuition there! I'm not concerned, though - decision theory and game theory will both be evaluated on a balance of intuitions. That, of course, does not mean that every intuitively appealing norm will be accepted. But those that are rejected will lose out to a more intuitively appealing (combination of) norm(s).

In a previous comment, I said that physical counterfactuals follow trivially from physical laws, giving an ideal-gas example. But now I just said that our knowledge of physical counterfactuals is usually non-deductive. Problem? No: in the previous comment I remarked on the relation between truths in attempting to show that counterfactuals needn't be metaphysical monsters. In this comment I have been focusing on the agent's epistemic situation. We can know to a moral certainty that a causal relationship holds without being able to state the laws involved and without being able to make any inference that strictly deserves the label "deduction".

See in context