JanetK comments on Two straw men fighting - Less Wrong

2 Post author: JanetK 09 August 2010 08:53AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (157)

You are viewing a single comment's thread. Show more comments above.

Comment author: cousin_it 10 August 2010 08:36:36AM *  12 points [-]

Um.

Sometime ago I posted to decision-theory-workshop an idea that may be relevant here. Hopefully it can shed some light on the "solution to free will" generally accepted on LW, which I agree with.

Imagine the following setting for decision theory: a subprogram that wants to "control" the output of a bigger program containing it. So we have a function world() that makes calls to a function agent() (and maybe other logically equivalent copies of it), and agent() can see the source code of everything inclucing itself. We want to write an implementation of agent(), without foreknowledge of what world() looks like, so that it "forces" any world() to return the biggest "possible" answer (scare quotes are intentional).

For example, Newcomb's Problem:

def world():
box1 = 1000
box2 = (agent() == 2) ? 0 : 1000000
return box2 + ((agent() == 2) ? box1 : 0)

Then a possible algorithm for agent() may go as follows. Look for machine-checkable mathematical proofs (up to a specified max length) of theorems of the form "agent()==A implies world()==U" for varying values of A and U. Then, after searching for some time, take the biggest found value of U and return the corresponding A. For example, in Newcomb's Problem there are easy theorems, derivable even without looking at the source code of agent(), that agent()==2 implies world()==1000 and agent()==1 implies world()==1000000.

The reason this algorithm works is very weird, so you might want to read the following more than once. Even though most of the theorems proved by the agent are based on false premises (because it is logically impossible for agent() to return a value other than the one it actually returns), the one specific theorem that leads to maximum U must turn out to be correct, because the agent makes its premise true by outputting A. In other words, an agent implemented like that cannot derive a contradiction from the logically inconsistent premises it uses, because then it would "imagine" it could obtain arbitrarily high utility (a contradiction implies anything, including that), therefore the agent would output the corresponding action, which would prove the Peano axioms inconsistent or something.

To recap: the above describes a perfectly deterministic algorithm, implementable today in any ordinary programming language, that "inspects" an unfamiliar world(), "imagines" itself returning different answers, "chooses" the best one according to projected consequences, and cannot ever "notice" that the other "possible" choices are logically inconsistent with determinism. Even though the other choices are in fact inconsistent, and the agent has absolutely perfect "knowledge" of itself and the world, and as much CPU time as it wants. (All scare quotes are, again, intentional.)

Comment author: JanetK 12 August 2010 09:02:31AM 0 points [-]

Is there any way that this applies to me or you making a decision? If it does can you give an indication of how. Thanks.