Blueberry comments on Two straw men fighting - Less Wrong

2 Post author: JanetK 09 August 2010 08:53AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (157)

You are viewing a single comment's thread. Show more comments above.

Comment author: cousin_it 10 August 2010 08:36:36AM *  12 points [-]

Um.

Sometime ago I posted to decision-theory-workshop an idea that may be relevant here. Hopefully it can shed some light on the "solution to free will" generally accepted on LW, which I agree with.

Imagine the following setting for decision theory: a subprogram that wants to "control" the output of a bigger program containing it. So we have a function world() that makes calls to a function agent() (and maybe other logically equivalent copies of it), and agent() can see the source code of everything inclucing itself. We want to write an implementation of agent(), without foreknowledge of what world() looks like, so that it "forces" any world() to return the biggest "possible" answer (scare quotes are intentional).

For example, Newcomb's Problem:

def world():
box1 = 1000
box2 = (agent() == 2) ? 0 : 1000000
return box2 + ((agent() == 2) ? box1 : 0)

Then a possible algorithm for agent() may go as follows. Look for machine-checkable mathematical proofs (up to a specified max length) of theorems of the form "agent()==A implies world()==U" for varying values of A and U. Then, after searching for some time, take the biggest found value of U and return the corresponding A. For example, in Newcomb's Problem there are easy theorems, derivable even without looking at the source code of agent(), that agent()==2 implies world()==1000 and agent()==1 implies world()==1000000.

The reason this algorithm works is very weird, so you might want to read the following more than once. Even though most of the theorems proved by the agent are based on false premises (because it is logically impossible for agent() to return a value other than the one it actually returns), the one specific theorem that leads to maximum U must turn out to be correct, because the agent makes its premise true by outputting A. In other words, an agent implemented like that cannot derive a contradiction from the logically inconsistent premises it uses, because then it would "imagine" it could obtain arbitrarily high utility (a contradiction implies anything, including that), therefore the agent would output the corresponding action, which would prove the Peano axioms inconsistent or something.

To recap: the above describes a perfectly deterministic algorithm, implementable today in any ordinary programming language, that "inspects" an unfamiliar world(), "imagines" itself returning different answers, "chooses" the best one according to projected consequences, and cannot ever "notice" that the other "possible" choices are logically inconsistent with determinism. Even though the other choices are in fact inconsistent, and the agent has absolutely perfect "knowledge" of itself and the world, and as much CPU time as it wants. (All scare quotes are, again, intentional.)

Comment author: Blueberry 10 August 2010 10:14:41AM 0 points [-]

This is brilliant. This needs to be a top-level post.

Comment author: cousin_it 12 August 2010 05:48:12PM *  0 points [-]

Done. I'm skeptical that it will get many upvotes, though.

Comment author: Wei_Dai 12 August 2010 07:05:13PM 0 points [-]

I'm skeptical that it will get many upvotes, though.

You seem to be either pathologically under-confident (considering that the comment your post was based on was voted up to 9, and people were explicitly asking you to make a top post out of it), or just begging for votes. :)

Comment author: cousin_it 12 August 2010 07:09:11PM 0 points [-]

It's a little bit of both, I guess.

Comment deleted 10 August 2010 09:16:50PM *  [-]
Comment author: jimrandomh 10 August 2010 09:43:33PM 1 point [-]

If the others aren't reposting for whatever reason, I don't want to go against the implied norm.

It is much more likely that people aren't posting because they haven't thought of it or can't be bothered. I too would like to see top-level posts on this topic. And I wouldn't worry about grabbing credit; as long as you putting attributions or links in the expected places, you're fine.

Comment author: cousin_it 10 August 2010 09:48:36PM *  0 points [-]

Sorry for deleting my comment. I still have some unarticulated doubts, will think more.

Comment author: Vladimir_Nesov 11 August 2010 07:32:04AM *  1 point [-]

For a bit of background regarding priority from my point of view: the whole idea of ADT was "controlling the logical consequences by deciding which premise to make true", which I then saw to also have been the idea behind UDT (maybe implicitly, Wei never commented on that). Later in the summer I shifted towards thinking about general logical theories, instead of specifically equivalence of programs, as in UDT.

However, as of July, there were two outstanding problems. First, it was unclear what kinds of things are possible to prove from a premise that the agent does X, and so how feasible brute force theories of consequences were as a model of this sort of decision algorithms. Your post showed that in a certain situation it is indeed possible to prove enough to make decisions using only this "let's try to prove what follows" principle.

Second, maybe more importantly, it was very much unclear in what way one should state (the axioms of) a possible decision. There were three candidates to my mind: (1) try to state a possible decision in a weaker way, so that the possible decisions that aren't actual don't produce inconsistent theories, (2) try to ground the concept (theory) of a possible decision in the concept of reality, where the agent was built in the first place, which would serve as a specific guideline for fulfilling (1); and (3) try to live with inconsistency. The last option seemed less and less doable, the first option depended on rather arbitrary choices, and the second is frustratingly hairy.

However, in a thread on decision-theory-workshop, your comments prompted me to make the observation that consequences always appear consistent, that one can't prove absurdity from any possible action, even though consequences are actually inconsistent (which you've reposted in the comment above). This raises the chances for option (3), dealing with inconsistency, although it's still unclear what's going on.

Thus, your input substantially helped with both problems. I'm not overly enthused with the results only because they are still very much incomplete.