Blueberry comments on Two straw men fighting - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (157)
Um.
Sometime ago I posted to decision-theory-workshop an idea that may be relevant here. Hopefully it can shed some light on the "solution to free will" generally accepted on LW, which I agree with.
Imagine the following setting for decision theory: a subprogram that wants to "control" the output of a bigger program containing it. So we have a function world() that makes calls to a function agent() (and maybe other logically equivalent copies of it), and agent() can see the source code of everything inclucing itself. We want to write an implementation of agent(), without foreknowledge of what world() looks like, so that it "forces" any world() to return the biggest "possible" answer (scare quotes are intentional).
For example, Newcomb's Problem:
Then a possible algorithm for agent() may go as follows. Look for machine-checkable mathematical proofs (up to a specified max length) of theorems of the form "agent()==A implies world()==U" for varying values of A and U. Then, after searching for some time, take the biggest found value of U and return the corresponding A. For example, in Newcomb's Problem there are easy theorems, derivable even without looking at the source code of agent(), that agent()==2 implies world()==1000 and agent()==1 implies world()==1000000.
The reason this algorithm works is very weird, so you might want to read the following more than once. Even though most of the theorems proved by the agent are based on false premises (because it is logically impossible for agent() to return a value other than the one it actually returns), the one specific theorem that leads to maximum U must turn out to be correct, because the agent makes its premise true by outputting A. In other words, an agent implemented like that cannot derive a contradiction from the logically inconsistent premises it uses, because then it would "imagine" it could obtain arbitrarily high utility (a contradiction implies anything, including that), therefore the agent would output the corresponding action, which would prove the Peano axioms inconsistent or something.
To recap: the above describes a perfectly deterministic algorithm, implementable today in any ordinary programming language, that "inspects" an unfamiliar world(), "imagines" itself returning different answers, "chooses" the best one according to projected consequences, and cannot ever "notice" that the other "possible" choices are logically inconsistent with determinism. Even though the other choices are in fact inconsistent, and the agent has absolutely perfect "knowledge" of itself and the world, and as much CPU time as it wants. (All scare quotes are, again, intentional.)
This is brilliant. This needs to be a top-level post.
Done. I'm skeptical that it will get many upvotes, though.
You seem to be either pathologically under-confident (considering that the comment your post was based on was voted up to 9, and people were explicitly asking you to make a top post out of it), or just begging for votes. :)
It's a little bit of both, I guess.
It is much more likely that people aren't posting because they haven't thought of it or can't be bothered. I too would like to see top-level posts on this topic. And I wouldn't worry about grabbing credit; as long as you putting attributions or links in the expected places, you're fine.
Sorry for deleting my comment. I still have some unarticulated doubts, will think more.
For a bit of background regarding priority from my point of view: the whole idea of ADT was "controlling the logical consequences by deciding which premise to make true", which I then saw to also have been the idea behind UDT (maybe implicitly, Wei never commented on that). Later in the summer I shifted towards thinking about general logical theories, instead of specifically equivalence of programs, as in UDT.
However, as of July, there were two outstanding problems. First, it was unclear what kinds of things are possible to prove from a premise that the agent does X, and so how feasible brute force theories of consequences were as a model of this sort of decision algorithms. Your post showed that in a certain situation it is indeed possible to prove enough to make decisions using only this "let's try to prove what follows" principle.
Second, maybe more importantly, it was very much unclear in what way one should state (the axioms of) a possible decision. There were three candidates to my mind: (1) try to state a possible decision in a weaker way, so that the possible decisions that aren't actual don't produce inconsistent theories, (2) try to ground the concept (theory) of a possible decision in the concept of reality, where the agent was built in the first place, which would serve as a specific guideline for fulfilling (1); and (3) try to live with inconsistency. The last option seemed less and less doable, the first option depended on rather arbitrary choices, and the second is frustratingly hairy.
However, in a thread on decision-theory-workshop, your comments prompted me to make the observation that consequences always appear consistent, that one can't prove absurdity from any possible action, even though consequences are actually inconsistent (which you've reposted in the comment above). This raises the chances for option (3), dealing with inconsistency, although it's still unclear what's going on.
Thus, your input substantially helped with both problems. I'm not overly enthused with the results only because they are still very much incomplete.