What a reduction of "could" could look like

53 Post author: cousin_it 12 August 2010 05:41PM

By requests from Blueberry and jimrandomh, here's an expanded repost of my comment which was itself a repost of my email sent to decision-theory-workshop.

(Wait, I gotta take a breath now.)

A note on credit: I can only claim priority for the specific formalization offered here, which builds on Vladimir Nesov's idea of "ambient control", which builds on Wei Dai's idea of UDT, which builds on Eliezer's idea of TDT. I really, really hope to not offend anyone.

(Whew!)

Imagine a purely deterministic world containing a purely deterministic agent. To make it more precise, agent() is a Python function that returns an integer encoding an action, and world() is a Python function that calls agent() and returns the resulting utility value. The source code of both world() and agent() is accessible to agent(), so there's absolutely no uncertainty involved anywhere. Now we want to write an implementation of agent() that would "force" world() to return as high a value as possible, for a variety of different worlds and without foreknowledge of what world() looks like. So this framing of decision theory makes a subprogram try to "control" the output of a bigger program it's embedded in.

For example, here's Newcomb's Problem:

def world():
 box1 = 1000
 box2 = (agent() == 2) ? 0 : 1000000
 return box2 + ((agent() == 2) ? box1 : 0)

A possible algorithm for agent() may go as follows. Look for machine-checkable mathematical proofs, up to a specified max length, of theorems of the form "agent()==A implies world()==U" for varying values of A and U. Then, after searching for some time, take the biggest found value of U and return the corresponding A. For example, in Newcomb's Problem above there are easy theorems, derivable even without looking at the source code of agent(), that agent()==2 implies world()==1000 and agent()==1 implies world()==1000000.

The reason this algorithm works is very weird, so you might want to read the following more than once. Even though most of the theorems proved by the agent are based on false premises (because it is obviously logically contradictory for agent() to return a value other than the one it actually returns), the one specific theorem that leads to maximum U must turn out to be correct, because the agent makes its premise true by outputting A. In other words, an agent implemented like that cannot derive a contradiction from the logically inconsistent premises it uses, because then it would "imagine" it could obtain arbitrarily high utility (a contradiction implies anything, including that), therefore the agent would output the corresponding action, which would prove the Peano axioms inconsistent or something.

To recap: the above describes a perfectly deterministic algorithm, implementable today in any ordinary programming language, that "inspects" an unfamiliar world(), "imagines" itself returning different answers, "chooses" the best one according to projected consequences, and cannot ever "notice" that the other "possible" choices are logically inconsistent with determinism. Even though the other choices are in fact inconsistent, and the agent has absolutely perfect "knowledge" of itself and the world, and as much CPU time as it wants. (All scare quotes are intentional.)

This is progress. We started out with deterministic programs and ended up with a workable concept of "could".

Hopefully, results in this vein may someday remove the need for separate theories of counterfactual reasoning based on modal logics or something. This particular result only demystifies counterfactuals about yourself, not counterfactuals in general: for example, if agent A tries to reason about agent B in the same way, it will fail miserably. But maybe the approach can be followed further.

Comments (103)

Comment author: orthonormal 12 August 2010 06:13:29PM 2 points [-]

Bravo!

Comment author: Vladimir_Nesov 12 August 2010 06:39:31PM *  4 points [-]

Note that world() essentially doesn't talk about what can happen, instead it talks about how to compute utility, and computation of utility is a single fixed program without parameters (not even depending on the agent), that the agent "controls" from the inside.

To clarify, in the Newcomb example, the preference (world) program could be:

def world(): box1 = 1000 box2 = (agent1() == 2) ? 0 : 1000000 return box2 + ((agent2() == 2) ? box1 : 0)

while the agent itself knows its code in the form agent3(). If it can prove both agent1() and agent2() to be equivalent to agent3(), it can decide in exactly the same way, but now the preference program doesn't contain the agent even once, there is no explicit dependency of the world program (utility of the outcome) on the agent's actions. Any dependency is logically inferred.

And we now have a prototype of the notion of preference: a fixed program that computes utility.

Comment author: Perplexed 05 September 2010 11:28:35PM 1 point [-]

And we now have a prototype of the notion of preference: a fixed program that computes utility.

One possible problem: There is a difference between a program and the function it computes. The notion of preference is perhaps captured best by a function, not a program. Many programs could be written, all computing the same function. This wouldn't matter much if we were only going to call or invoke the program, since all versions of the program compute the same result. But, this is not what cousin_it suggests here. He wants us to examine the source of the program, to prove theorems about the program. Given finite resources, the things we can prove about one program may not match the things we can prove about another program, even if the two programs would compute the same result in all cases if agent() returned the same result.

Comment author: Vladimir_Nesov 06 September 2010 02:18:25AM *  1 point [-]

There is a difference between a program and the function it computes. The notion of preference is perhaps captured best by a function, not a program.

No, a program. What the program defines is a single constant value, not a function (remember: if we are talking about a program, it's a constant program, taking no parameters!). And of course it matters how that constant is defined, and not at all what that constant is. More generally, we can define that constant not by a program, but by a logical theory, which will be the topic of my next post.

Comment author: Perplexed 06 September 2010 02:45:08AM 1 point [-]

By a "logical theory" do you mean what logicians usually mean by a "theory"? A deductively closed set of sentences?

Wow! Well, that eliminates a lot of the arbitrary character I was objecting to in using programs to represent the world/decision problem. But there still are a lot of deductive systems to choose among. I await your next post with interest.

Comment author: Vladimir_Nesov 06 September 2010 02:59:51AM 0 points [-]

I won't settle the choice, only point out the generality of notion and how it applies, direction to look for further refinements.

Comment author: cousin_it 12 August 2010 07:03:17PM *  1 point [-]

Yep - thanks. This is an important idea that I didn't want to burden the post with. It would have brought the argument closer to UDT or ADT research, and I'd much prefer if you and Wei Dai wrote posts on these topics, not me. In matters of ethics and priority, one cannot be too careful. Besides, I want to read your posts because you have different perspectives from mine on these matters.

Comment author: Vladimir_Nesov 12 August 2010 07:36:30PM 0 points [-]

Thanks, I think I'll write up this notion of preference remark (a little more generally, as a theory not program, with program as a special case, and lack of explicit dependence in the discussion of why the program/theory is "fixed").

Comment author: gRR 18 February 2012 04:13:15AM *  1 point [-]

The setup can be translated into a purely logical framework. There would be constants A (Agent) and W (World), satisfying the axioms:

(Ax1) A=1 OR A=2

(Ax2) (A=1 AND W=1000) OR (A=2 AND W=1000000)

(Ax3) forall a1,a2,w1,w2 ((A=a1 => W=w1) AND (A=a2 => W=w2) AND (w1>w2)) => NOT (A=a2)

Then the Agent's algorithm would be equivalent to a step-by-step deduction:

(1) Ax2 |- A=1 => W=1000

(2) Ax2 |- A=2 => W=1000000

(3) Ax3 + (1) + (2) |- NOT (A=1)

(4) Ax1 + (3) |- A=2

In this form it is clear that there are no logical contradictions at any time. The system never believes A=1, it only believes (A=1 => W=1000), which is true.

In the purely logical framework, the word "could" is modeled by what Hofstadter calls the "Fantasy Rule" of propositional calculus.

Comment author: Vladimir_Nesov 18 February 2012 11:26:01AM *  0 points [-]

The direction of setting the problem up in logical setting is further pursued here and here.

The axioms as you wrote them don't fit or work for multiple reasons. Conceptually, the problem is in figuring out how decisions influence outcomes, so having something like (Ax2) defies this purpose. Statements like that are supposed to be inferred by the agent from more opaque-to-the-purpose-of-deciding definitions of A and W, and finding the statements of suitable form constitutes most of the activity in making a decision.

If the agent can figure out what it decides, then by default it becomes able to come up with spurious arguments for any decision, which makes its inference system (axioms) either inconsistent or unpredictable (if we assume consistency and don't let (Ax3) into it). The rules that say which arguments lead to which actions shouldn't be part of the theory that the agent uses to come up with the arguments. This excludes your (Ax3), which can only be assumed "externally" to the agent, a way of seeing if it's doing OK, not something it should itself be able to see.

A contradiction can be inferred from your axioms in this way: (4) says A=2, which implies [A=1 => W=0], which by (Ax3) together with [A=2 => W=1000000] implies that A=1, which is a contradiction and a decision to do whatever (I think you've also mixed up A=1 and A=2 while going from (Ax2) to (1) and (2)).

Comment author: gRR 18 February 2012 05:41:31PM *  0 points [-]

Thanks! This site is rather large :) And the most interesting (for me) stuff is rarely linked from the sequences. That is, it seems so... Is there a more efficient way to get up to date? [I would SO like to have found this site 5 years ago!]

The axioms were not proposed seriously as a complete solution, of course. The (Ax2) can be thought of as a cached consequence of a long chain of reasoning. The main point of the post was about the word "could", after all.

(Ax3) is just an axiomatic restatement of the "utility maximization" goal. I don't see why an agent can't know about itself that it's a utility maximizer.

A contradiction can be inferred from your axioms in this way: (4) says A=2, which implies [A=1 => W=0], which by (Ax3) together with [A=2 => W=1000000] implies that A=1, which is a contradiction and a decision to do whatever (I think you've also mixed up A=1 and A=2 while going from (Ax2) to (1) and (2)).

Yes, A=1 and A=2 were mixed up in the (Ax2), fixed it now. But I don't see your contradiction. (Ax3) + [A=1 => W=0] + [A=2 => W=1000000] does not imply A=1. It implies NOT(A=1)

Comment author: Vladimir_Nesov 18 February 2012 07:21:42PM *  0 points [-]

Is there a more efficient way to get up to date?

See the posts linked to from UDT, ADT, TDT, section "Decision theory" in this post.

I don't see why an agent can't know about itself that it's a utility maximizer.

It can, it's just a consideration it can't use in figuring out how the outcome depends on its decision. While looking for what is influenced by a decision, the decision itself should remain unknown (given that the decision is determined by what this dependence turns out to be, it's not exactly self-handicapping).

Yes, A=1 and A=2 were mixed up in the (Ax2), fixed it now. But I don't see your contradiction. (Ax3) + [A=1 => W=0] + [A=2 => W=1000000] does not imply A=1. It implies NOT(A=1)

It seems the mixing up of A=1 and A=2 didn't spare this argument, but it's easy enough to fix. From A=2 we have NOT(A=1), so [A=1 => W=1,000,000,000] and together with [A=2 => W=1,000,000] by (Ax3) we have A=1, contradiction.

Comment author: gRR 18 February 2012 11:56:07PM 1 point [-]

I thought about a different axiomatization, which would not have the same consistency problems. Not sure whether this is the right place, but:

Let X be a variable ranging over all possible agents.

(World Axioms)
forall X [Decides(X)=1 => Receives(X)=1000000]
forall X [Decides(X)=2 => Receives(X)=1000]

(Agent Axioms - utility maximization)
BestX = argmax{X} Receives(X)
Decision = Decides(BestX)

Then Decision=1 is easily provable, regardless of whether any specific BestX is known.
Does not seem to lead to contradictions.

Comment author: gRR 18 February 2012 08:16:52PM 1 point [-]

Yes, I see now, thanks.

If I attempted to fix this, I would try to change (Ax3) to something like:

forall a1,a2,w1,w2 ((A=a1 => W=w1) AND (A=a2 => W=w2) AND (w1>w2)) => NOT (FinalA=a2)

where FinalA is the actual decision. But then, why should the world's axiom (Ax2) be defined in terms of A and not FinalA? This seems conceptually wrong...

Ok, I'll go read up on the posts :)

Comment author: AlephNeil 14 August 2010 03:59:51AM *  3 points [-]

OK, so just spelling out the argument for my own amusement:

The agent can prove the implications "agent() == 1 implies world() == 1000000" and "agent() == 2 implies world() == 1000", both of which are true.

Let x and y in {1, 2} be such that agent() == x and agent() != y.

Let Ux be the greatest value of U for which the agent proves "agent() == x implies world() == U". Let Uy be the greatest value of U for which the agent proves "agent() == y implies world() == U".

Then Ux is greater than or equal to Uy.

If the agent's formalised reasoning is consistent then since agent() == x is true, Ux must be the only U such that "agent() == x implies world() == U" is provable.

Now, if there is more than one U such that "agent() == y implies world() == U" is provable then "agent() == y implies contradiction" is provable, and hence so is "agent() == y implies world() == U" for some value of U larger than Ux. Hence Uy would be larger than Ux, which is impossible.

Therefore, the implications "agent() == 1 implies world() == 1000000" and "agent() == 2 implies world() == 1000" are the only ones provable.

Therefore, agent() == 1.

Comment author: cousin_it 14 August 2010 08:52:31AM 0 points [-]

Yes, this looks right.

Comment author: [deleted] 13 August 2010 07:40:36PM *  1 point [-]

I enjoyed reading and thinking about this, but now I wonder if it's less interesting than it first appears. Tell me if I'm missing the point:

Since world is a function depending on the integers agent outputs, and not on agent itself, and since agent knows world's source code, agent could do the following: instead of looking for proofs shorter than some fixed length that world outputs u if agent outputs a, it could just simulate world and tabulate the values world(1), world(2), world(3),... up to some fixed value world(n). Then output the a < n + 1 for which world(a) is largest.

In other words, agent can just ask world what's the best it can get, then do that.

Granted there are situations in which it takes longer to compute world(n) than to prove that world(n) takes a certain value, but the reverse is also true (and I think more likely: the situations I can think of in which cousin it's algorithm beats mine, resource-wise, are contrived). But cousin it is considering an idealized problem in which resources don't matter, anyway.

Edit: If I'm wrong about this, I think it will be because the fact that world depends only on agent's output is somehow not obvious to agent. But I can't figure out how to state this carefully.

Comment author: cousin_it 14 August 2010 08:49:06AM *  1 point [-]

You're not completely wrong, but the devil is in the details. The world's dependence on the agent's output may indeed be non-obvious, as in Nesov's comment.

Comment author: [deleted] 15 August 2010 04:10:43AM 2 points [-]

Are you saying there's a devil in my details that makes it wrong? I don't think so. Can you tell me a tricky thing that world can do that makes my code for agent worse than yours?

About "not obvious how world depends on agent's output," here's a only very slightly more tricky thing agent can do, that is still not as tricky as looking for all proofs of length less than n. It can write a program agent-1 (is this what Nesov is getting at?) that always outputs 1, then compute world(agent-1). It can next write a program agent-2 that always outputs 2, then compute world(agent-2). On and on up to agent-en. Then have agent output the k for which world(agent-k) is largest. Since world does not depend on any agent's source code, if agent outputs k then world(agent) = world(agent-k).

Again this is just a careful way of saying "ask world what's the best agent can get, and do that."

Comment author: [deleted] 13 August 2010 03:32:49PM *  1 point [-]

Am I right that if world's output depends on the length of time agent spends thinking, then this solution breaks?

Edit: I guess "time spent thinking" is not a function of "agent", and so world(agent) cannot depend on time spent thinking. Wrong?

Edit 2: Wrong per cousin's comment. World does not even depend on agent, only on agent's output.

Comment author: cousin_it 13 August 2010 03:36:30PM *  1 point [-]

Yes. This solution is only "optimal" if world() depends only on the return value of agent(). If world() inspects the source code of agent(), or measures cycles, or anything like that, all bets are off - it becomes obvously impossible to write an agent() that works well for every possible world(), because world() could just special-case and penalize your solution.

Edit: You weren't wrong! You identified an important issue that wasn't clear enough in the post. This is the right kind of discussion we should be having: in what ways can we relax the restrictions on world() and still hope to have a general solution?

Comment author: [deleted] 13 August 2010 03:52:35PM 0 points [-]

One can interpret the phrase "world calls agent and returns utility" with different levels of obtuseness:

  1. World looks up agent, examines it, runs it, sees what it's output was and intermediate steps were, then decides what agent deserves.

  2. World looks at a sheet of paper agent has written a number on. Analyzes handwriting. Then decides what agent deserves

  3. World does not even analyze handwriting.

You mean 3, right? That's all I meant by edit 2.

Comment author: cousin_it 13 August 2010 03:53:35PM *  0 points [-]

Yes, that's right.

Comment author: Benja 13 August 2010 12:19:21AM *  6 points [-]

A subtlety: If the proof search finds "agent()==1 implies world()==5" and "agent()==2 implies world()==1000", then the argument in the post shows that it will indeed turn out that agent()==2 and world()==1000. But the argument does not show in any similar sense that 'if the agent had returned 1, then world() would equal 5' -- in fact this has no clear meaning (other than the one used inside the algorithm, i.e. that the theorem above is found in the proof search).

I do not think that this is actually a problem here, i.e. I don't think it's possible to construct a world() where we would intuitively say that the algorithm doesn't optimize [except of course by making the world so complicated that the proofs get too long], but there's a related situation where something similar does play a role:

Suppose that instead of searching only proofs, you let agent() call a function probability(logicalStatement), and then do expected utility maximization. [To be precise, let both the possible actions and the possible utility values come from a finite set; then you can call probability(...) for all possible combinations, and since different utilities for the same action are mutually exclusive, their probabilities are additive, so the expected utility calculation is straight-forward.]

You might imagine that you could score different candidates for probability(...) on how much probability they assign to false statements (at least I at one point thought this might work), but this is not sufficient: an evil probability function might assign 99% probability to "agent()==1 implies world()==1" and to "agent()==2 implies world()==5", even though

def world(): return 1000 if agent()==1 else 5

because both logical statements would turn out to be perfectly true.

Comment author: cousin_it 13 August 2010 07:42:48AM *  2 points [-]

Sorry, my previous reply was based on a misparsing of your comment, so I deleted it.

So the "mathematical intuition module" can be 100% truthful and evil at the same time, because making true statements isn't the same as making provable statements. Funny. It seems the truth vs provability distinction is actually the heart of decision theory.

Comment author: Wei_Dai 19 August 2010 03:17:12PM 3 points [-]

It seems the truth vs provability distinction is actually the heart of decision theory.

I came across this paragraph from Bruno Marchal today, which strikingly reminds me of Vinge's "Hexapodia as the key insight":

Talking about Smullyan's books, I recall that "Forever Undecided" is a recreational (but ok ... not so easy, nor really recreational) introduction to the modal logic G (the one Solovay showed to be a sound and complete theory for the Godel-Lob (Gödel, Löb, or Goedel, Loeb) provability/concistency logic. G is the key for the math in the TOE approach I am developing. The logic G is the entry for all arithmetical "Plotinian Hypostases".

Bruno is a prolific participant on the "everything-list" that I created years ago, but I've never been able to understand much of what he talked about. Now I wonder if I should have made a greater effort to learn mathematical logic. Do his writings make sense to you?

Comment author: cousin_it 19 August 2010 03:45:36PM *  5 points [-]

No, they look like madness.

But logic is still worth learning.

EDIT: it seems I was partly mistaken. What I could parse of Marchal's mathy reasoning was both plausible and interesting, but parsing is quite hard work because he expresses his ideas in a very freaky manner. And there still remain many utterly unparseable philosophical bits.

Comment author: Wei_Dai 19 August 2010 10:02:17PM 12 points [-]

Thanks for taking another look. BTW, I suggest if people change the content of their comments substantially, it's better to make a reply to yourself, otherwise those who read LW by scanning new comments will miss the new content.

Comment author: Benja 12 August 2010 09:52:37PM 4 points [-]

This has been extremely useful for me, even though I understood the basic idea already, because with the discussion it made me understand something very basic that somehow I'd always missed so far: I'd always thought of this kind of program in the context of applying classical decision theory with the possible actions being particular agent()s. But using classical decision theory already presupposes some notion of "could." Now I understand that the principle here yields (or is a real step towards) a notion of could-ness even in the real world: If you know the formula that computes/approximates the universe (assuming there is such) and can state a utility function over results of this program, you can program a computer with something like your program, which gives a notion of "could." Thank you for helping me fill my stupid gap in understanding!

Side note: the intimate way that your notion of "could" is bound up with the program you're running makes it nontrivial (or at any rate non-obvious to me) how to compare different possible programs. I don't immediately see a direct equivalent to how, given a decision-theoretic problem, you can compare different efficient heuristic programs on how close they come to the theoretical optimum.

Comment author: cousin_it 13 August 2010 06:27:05AM *  0 points [-]

Good question, but surprisingly hard to parse. Took me three attempts.

Basically, you point out that my implementation of agent() quantifies over possible choices, but sometimes it's useful for agents to quantify over programs that lead to choices. Maybe this falls magically out of the proof checker, maybe not. I cannot answer this. My brain hurts.

Comment author: Benja 14 August 2010 02:50:17PM 0 points [-]

Ouch, sorry! Too much time spent thinking about these things by myself without having to explain what I mean, probably. I think I still haven't gotten across the simple point I was trying to make; let me try again.

Given an optimization problem like the TSP, there is a well-defined notion of "what utility/score do I get if I choose to do X," independent of the algorithm you use to compute a solution -- so there is a theoretical optimum solution independent of the algorithm you use to make a choice, and you can compare different heuristic algorithms on how close they come to the theoretical optimum. It seems useful to have this sort of description of the problem when trying to find actual practical algorithms.

I used to hope that we could find an equivalent for timeless decision theories, an abstract description of the optimum solution that we could then try to approximate through heuristic algorithms, but in your work, the definition of "what would happen if I did X" is so intimately connected to the algorithm making the decision that it's at least not obvious to me how this could work.

Comment author: cousin_it 14 August 2010 03:45:08PM *  1 point [-]

Most (all?) proofs of statements like "if I did X, utility would be U" don't actually need to look at the implementation of the agent, only find the call sites in world(). So we can compare different heuristics without things blowing up. Or did you mean something else?

Do you have an example of a decision theory problem where the optimal solution is hard to find, but heuristics help? Or should I use the TSP as an example?

Comment author: Benja 15 August 2010 03:27:21PM 0 points [-]

(Sorry for the delayed reply.) I'm a little confused: it's true that we can easily compare heuristics for world()s that depend on the agent only through easily identifiable call sites, but if we're only considering this class of problems, what does your algorithm add over the simpler algorithm of replacing all call sites to agent() by a constant, running the result, and choosing the constant that gives the highest value?

(I suppose I was hoping for more direct game theoretical applications than you, so I had situations in mind where the (rest of) the world does depend on the source of the agent.)

Regarding examples, what I had in mind was: Surely your literal algorithm of enumerating all proofs is not efficient enough to run even on toy problems like your implementation of Newcomb! Also, surely for any real-world problem, it won't be feasible to find the optimal solution. So at some point, we will want to use the theory as a background for looking for practical (and thus necessarily heuristic) algorithms.

Comment author: cousin_it 15 August 2010 03:35:17PM *  0 points [-]

what does your algorithm add over the simpler algorithm of replacing all call sites to agent() by a constant, running the result, and choosing the constant that gives the highest value?

Admittedly, not much. If world() calls agent2() that provably returns the same value as agent(), agent() will notice it. But this wasn't really the point of the post. The point was to show that having perfect knowledge of yourself doesn't necessarily stop you from considering counterfactuals about yourself. Of course it's all completely impractical.

Perhaps a closer approximation to reality would be a "listener" algorithm that accepts proofs (e.g. by reading game theory books), instead of trying to find them.

Comment author: Vladimir_Nesov 12 August 2010 09:35:47PM *  4 points [-]

Note that by proving "agent()==1 implies world()==1000000", and then performing action 1, you essentially conclude that "world()==1000000" was valid in the first place. "You had the mojo all along", as Aaronson said (on Löb's theorem). Decision making is all about lack of logical transparency, inability to see the whole theory from its axioms.

Comment author: cousin_it 13 August 2010 05:15:04AM *  0 points [-]

Woah, I just noticed the weird alternative proof of Löb's theorem contained in that PDF. Can't even tell if it's correct. Thanks!

Comment author: SilasBarta 12 August 2010 06:38:19PM *  4 points [-]

I was confused on the first reading: I thought you were presenting a contrasting reduction of couldness. Then, on reading the earlier comment, I saw the context in which you were presenting this: as a way to clarify the existing free will solution EY gave. Now, it makes more sense.

(It might be helpful to include more of the motivation for this explanation at the start.)

Comment author: cousin_it 12 August 2010 06:43:36PM *  1 point [-]

It's true that I agree with Eliezer's solution and my post can be seen as a formalization of that, but I'd much rather not discuss free will here and now :-)

Comment author: Vladimir_Nesov 12 August 2010 06:58:32PM 0 points [-]

It is a formalization of UDT, but I don't see how it's a formalization of TDT.

Comment author: cousin_it 12 August 2010 07:01:28PM *  0 points [-]

I think Silas meant Eliezer's "solution to free will", not TDT.

Comment author: SilasBarta 12 August 2010 11:05:16PM 1 point [-]

Posting to confirm. This is why I said:

I saw the context in which you were presenting this: as a way to clarify the existing free will solution EY gave

Comment author: Vladimir_Nesov 12 August 2010 07:09:52PM 0 points [-]

I believe "solution to free will" was based on TDT (although if you make TDT informal, and then formalize the result, it can well be no longer a formalization of TDT, and work better too).

Comment author: [deleted] 31 October 2010 06:04:18AM *  0 points [-]

Neat. It seems like one could think of the agent as surveying mathematical objects (sets of theorems of the form "agent()==A implies world()==U") and then forcing itself to be in a particular mathematical object (that which contains a particular true statement "agent()==A implies world()==U"). If we're thinking in terms of an ensemble the agent would already be in a particular mathematical object, so perhaps it's procedure could be thought of as forcing itself into a particular mathematical sub-structure.

When I think about this stuff I get the feeling I'm confused on many levels.

Comment author: torekp 14 August 2010 01:49:57AM *  0 points [-]

you might want to read the following more than once. Even though most of the theorems proved by the agent are based on false premises (because it is obviously logically contradictory for agent() to return a value other than the one it actually returns), the one specific theorem that leads to maximum U must turn out to be correct, because the agent makes its premise true by outputting A.

You were right: reading a second time helped. I figured out what is bothering me about your post. The above seems misspoken, and may lead to an overly constricted conception of what the agent "could" do.

The theorems proved use statements having false antecedents, but there are no false premises used by the agent. Any statement beginning "If agent==2 then..." is obviously true if the "if" is a simple material conditional.

If the "if" is of the counterfactual variety, it can still be true. There's nothing mysterious about counterfactuals: they follow trivially from physical laws. For example, if we have an ideal gas, it follows from PV=nRT that doubling the temperature with constant n and V would double the pressure. Supposing that our agent can be thoroughly described by deterministic laws doesn't undermine counterfactuals -- it grounds them.

To get from "if ... would ..." to "could...", where the "could" signifies ability, I would argue, it is sufficient that rationality play a critical causal role. Roughly, if the only thing preventing the enactment of the sequence is the rationality of the agent itself, then the sequence is one of the things the agent could do. If outputting "1" were optimal, the agent would do so; if "2" were optimal, the agent would do "2"; consideration of optimality or sub-optimality is a rational process; so those are both options for the agent.

There's nothing wrong with arguing for a modest conclusion, focusing on one restricted case of what an agent could do. But there's no reason to settle for that, either.

Comment author: cousin_it 14 August 2010 08:45:58AM *  0 points [-]

I don't understand your comment at all. You seem to assume as "trivial" many things that I set out to demystify. Maybe you could formalize your argument?

Comment author: Benja 14 August 2010 02:08:11PM *  3 points [-]

I'm not sure if I understand torekp's comment right, but it seems to me that the issue is the difference between "physical" and "logical" counterfactuals in the following sense:

For concreteness and simplicity, let's suppose physics can be modeled by a function step(state) that takes a state of the world, and returns the state of the world one instant later. Then you can ask what would happen if the current state were slightly different ("suppose there were an apple here in the air at this point in time; then it would fall down"). Call this a "physical" counterfactual.

Before reading Eliezer's arguments, I used to think this could be used to model decision-making: Let stateA be what the state of the world will really be at the point I've made my decision. For every choice X I could make, let stateA_X be the same as stateA except that the future copy of me is replaced by a version that has decided to do X. Now it's well-defined what the future of every stateA_X looks like, and if I have a well-defined utility function, then it's well-defined which choice(s) this utility is maximized. Of course I cannot calculate this, but, I thought, the actual decision making process could be seen as an approximation of this.

Except that this would lead to classical decision/game theory, with its unattractive consequences -- the case where Omega has just made two identical physical copies of me, and has us play a true Prisoner's Dilemma, etc. Since I can prove that the other copy of me will do the same thing in every stateA_X (because I'm only replacing myself, not the other copy), it follows that I should defect, because it gives me the higher payoff no matter what the other copy of me does.

Thus "physical" counterfactuals do not make for an acceptable decision theory, and we're forced to look for some notion not of, "what would happen if the state of the world were different," but of something like, "what would happen if this Turing machine would return a different result?" That would be a "logical" counterfactual. Of course, it's not obvious what the hell this is supposed to mean, and if it can be made to mean anything coherent -- your post is an answer to that question.

Comment author: torekp 15 August 2010 02:18:31PM 0 points [-]

Benja's got it: I'm interested in physical counterfactuals. They are the type that is involved in the everyday notion of what a person "could" do.

As for decision theory, I think that the "logical" counterfactuals should supplement, not supplant, the physical counterfactuals.

Comment author: cousin_it 15 August 2010 02:36:09PM *  2 points [-]

As for decision theory, I think that the "logical" counterfactuals should supplement, not supplant, the physical counterfactuals.

I understand that many people share this opinion, but I don't see much in the way of justification. What arguments do you have, besides intuition? I see no reason to expect the "correct" decision theory to be intuitive. Game theory, for example, isn't.

Comment author: torekp 18 August 2010 12:06:03AM *  1 point [-]

"Logical" counterfactuals either mean those whose antecedents are logically impossible, or those whose consequents follow from the antecedents by logical necessity, or both. In your Newcomb's-problem-solving algorithm example, both features apply.

But there are many decisions (or sub-problems within decisions) where neither feature applies. Where the agent is a human being without access to its "source code", it commits no contradiction in supposing "if I did A... but if I did B ..." even though at most one of these lies in the future that causality is heading toward. Additionally, one may be unable to prove that the expected utility of A would be U1 and the expected utility of B would be U2, even though one may reasonably believe both of these. Even when we know most of the relevant physical laws, we lack knowledge of relevant initial or boundary conditions, and if we had those we'd still lack the time to do the calculations.

Our knowledge of physical counterfactuals isn't deductive, usually, but it's still knowledge. And we still need to know the physical consequences of various actions - so we'll have to go on using these counterfactuals. Of course, I just used another intuition there! I'm not concerned, though - decision theory and game theory will both be evaluated on a balance of intuitions. That, of course, does not mean that every intuitively appealing norm will be accepted. But those that are rejected will lose out to a more intuitively appealing (combination of) norm(s).

In a previous comment, I said that physical counterfactuals follow trivially from physical laws, giving an ideal-gas example. But now I just said that our knowledge of physical counterfactuals is usually non-deductive. Problem? No: in the previous comment I remarked on the relation between truths in attempting to show that counterfactuals needn't be metaphysical monsters. In this comment I have been focusing on the agent's epistemic situation. We can know to a moral certainty that a causal relationship holds without being able to state the laws involved and without being able to make any inference that strictly deserves the label "deduction".

Comment author: Perplexed 13 August 2010 05:14:23AM 0 points [-]

the agent has absolutely perfect "knowledge" of itself and the world, and as much CPU time as it wants.

I'm uncomfortable with the unbounded CPU time assumption. For example, what if world() computes the payoff in a Prisoner's Dilemma between two identical agent()s? You get non-termination due to a recursion without an escape clause.

Comment author: cousin_it 13 August 2010 05:18:04AM *  0 points [-]

The implementation of agent() given in the post is guaranteed to terminate. It enumerates proofs about itself and the world up to a specified max proof length, it doesn't call itself recursively.

Comment author: Perplexed 13 August 2010 06:00:08AM 0 points [-]

Ok, I had been mistakenly assuming that it accomplished the proofs by simulating world(), which would necessarily involve recursive calls back to agent(), at least in a multi-player game.

My bad. But I am still not convinced that agent() is guaranteed to terminate. Suppose, for example, that it cannot find any proofs of the form "agent()==A implies world()==U"?

Also, in my example in which world() is supposed to compute the results of a Prisoner's Dilemma between two identical agents, is the "knowlege" that the two agents are indeed identical supposed to reside in world() or in agent()? I suspect that the right way to extend this thought experiment to the multi-player game scenario is to keep world() in the dark as to whether agent1() and agent2() are identical algorithms. Don't hard-code this fact into the agents, either. I think you have to let the two agents discover that they are identical to each other. The thing is, this "understanding" has to be achieved by proving things about proofs. So, I think that modal logic slips back in anyways.

Comment author: cousin_it 13 August 2010 06:07:48AM *  2 points [-]

It's true that the program might fail to find any proofs at all. Then it could output a random action, I guess? Or just return zero?

Puzzled about your last paragraph. The "knowledge" that the agents are equivalent has to be derivable by inspecting the program as a whole. (This is a very strong condition, but it can be satisfied.) Proving things about proofs is something proof checkers with sufficiently strong axioms can do just fine, no modal logic needed.

But in general, I don't believe this sort of reasoning carries over to multi-player games. If we take the Prisoner's Dilemma and tweak the payoffs slightly to make them non-symmetric, the agents are back to square one. Game theory is inherently harder than individual rationality. I really should write another post on that, because it seems to be a Pons Asinorum that many intelligent people fail to cross.

Comment author: Perplexed 13 August 2010 03:12:17PM 1 point [-]

Puzzled about your last paragraph.

Seems to me you understood it and responded appropriately.

Game theory is inherently harder than individual rationality.

Yes it is. One important difference is that game-playing agents generally use mixed strategies. So we need to distinguish the reading the strategy source code of another player from reading the pseudo-random-number source code and "seed". Rational agents will want to keep secrets.

A second difference is that the "preference structure" of the agent ought to be modeled as data about the agent, rather than data about the world. And some of this is information which a rational agent will wish to conceal as well.

So, as a toy model illustrating the compatibility of determinism and free will, your example (i.e. this posting) works fine. But as a way of modeling a flavor of rationality that "solves" one-shot PD and Newcomb-like problems in a "better" way than classical game theory, I think it doesn't (yet?) work.

Comment author: cousin_it 13 August 2010 03:21:21PM *  2 points [-]

Slightly tangential: I have a semi-worked-out notion of a "fair" class of problems which includes Newcomb's Problem because the predictor makes a decision based only on your action, not the "ritual of cognition" that preceded it (to borrow Eliezer's phrase). But the one-shot PD doesn't reside in this "fair" class because you don't cooperate by just predicting the opponent's action. You need to somehow tie the knot of infinite recursion, e.g. with quining or Löbian reasoning, and all such approaches seem to inevitably require inspecting the other guy's "ritual of cognition". This direction of research leads to all sorts of clever tricks, but ultimately it might be a dead end.

As for your main point, I completely agree. Any model that cannot accommodate limited information is bound to be useless in practice. I'm just trying to solve simple problems first, see what works and what doesn't in very idealized conditions. Don't worry, I'll never neglect game theory - it was accidentally getting my hands on Ken Binmore's "Fun and Games" that brought me back to math in the first place :-)

Comment author: Benja 14 August 2010 02:30:40PM *  0 points [-]

But in general, I don't believe this sort of reasoning carries over to multi-player games. If we take the Prisoner's Dilemma and tweak the payoffs slightly to make them non-symmetric, the agents are back to square one.

I don't understand. Do you mean changing the numeric values, leaving the game the same, or changing the ordering, turning it into different game? In the first case, it seems to me that the argument from your other post should work just fine [Edit: Wrong, see cousin_it's explanation below] -- roughly:

Suppose there's a short proof that agent1()=="C" iff agent2()=="cooperate". Then there's an only slightly longer proof that agent1()=="C" implies that both agents get the C/C payoff, and that agent1()=="D" implies both get the D/D payoff. Therefore, under the supposition, agent1 will choose "C"; similarly, agent2 will choose "cooperate".

In other words: If there's a short proof that agent1()=="C" iff agent2()=="cooperate", then agent1()=="C" and agent2()=="cooperate". By the Löbian reasoning in your previous post (linked above), it follows that both agents cooperate.

Am I confused about something? Or is it just that you were in fact talking about changing the payoffs so that the game is not a PD any longer? (I think it still at least bears mentioning if your work already solves the class of problems that Eliezer has been talking about, "where the other agents' decisions can be viewed as functions of one argument, that argument being your own choice in that particular case.")

Comment author: cousin_it 14 August 2010 02:45:47PM *  1 point [-]

This post does solve the class of problems Eliezer was talking about (I call it the "fair" class), but my previous Löbian post used a different algorithm that didn't do utility maximization. That algorithm explicitly said that it cooperates if it can find any proof that agent1()==agent2(), without comparing utilities or trying to prove other statements, and that explicitness made the proof work in the first place.

Your proof fails here:

Therefore, under the supposition, agent1 will choose "C"; similarly, agent2 will choose "cooperate".

It's extremely difficult for the agent to conclude that it will make any particular choice, because any proof of that (starting from your assumptions or any other assumptions) must also prove that the agent won't stumble on any other proofs that lead to yet higher outcomes. This amounts to the agent assuming the consistency of its own proof checker, which is a no go.

As far as I can see right now, making the payoffs slightly non-symmetric breaks everything. If you think an individual utility-maximizing algorithm can work in general game-theoretic situations, you have a unique canonical solution for all equilibrium selection and bargaining problems with non-transferable utility. Judging from the literature, finding such a unique canonical solution is extremely unlikely. (The transferable utility case is easy - it is solved by the Shapley value and my Freaky Fairness algorithm.)

Comment author: Vladimir_Nesov 14 August 2010 03:06:57PM *  0 points [-]

It's extremely difficult for the agent to conclude that it will make any particular choice, because any proof of that (starting from your assumptions or any other assumptions) must also prove that the agent won't stumble on any other proofs that lead to yet higher outcomes.

When does the agent need to have that knowledge? It just acts on certain conclusions, at the moment it obtains them. It can as well act on conclusions1 about its future conclusions2, the only problem might be that it never reaches those conclusions1 (and so won't act on them)

Comment author: Benja 14 August 2010 03:29:38PM 0 points [-]

The point is that I relied on the fact that if the agent can prove "If I cooperate, I get 5" and "If I defect, I get 2", then the agent will cooperate. This is a fact, but the agent cannot prove it: if its proof system were inconsistent, then it might (within maxsteps) also be able to prove "If I defect, I get 42," and thus end up defecting.

Maybe your comment suggests that the agent should choose to cooperate as soon as it has proven the first two statements above; but as far as the agent knows, if its proof system is inconsistent, it might find the proof of "If I defect, I get 42" earlier than the proof of "If I defect, I get 2." The agent cannot prove that "I will prove C -> 5 and D -> 2" implies "I will cooperate."

Comment author: Vladimir_Nesov 14 August 2010 05:01:46PM *  1 point [-]

The agent can just cooperate as soon as it proves "If I cooperate, I get X" and "If I defect, I get Y", for X>Y. This algorithm is justified, and the agent doesn't need to write its own algorithm, we are writing it.

Comment author: cousin_it 14 August 2010 05:36:33PM *  1 point [-]

I don't completely understand your idea, but Löbian cooperation seems to require enumerating a lot of proofs without stopping early. If Bob can stop early, Alice now needs to prove that Bob will stumble on a specific proof earlier than that, rather than at all. I remember trying and failing to remove the exhaustive search for proofs from my previous post. Or maybe I'm missing something?

Maybe you could formalize your idea and give some sort of proof? If it works, it'll be definite progress.

Comment author: Benja 14 August 2010 02:55:55PM 0 points [-]

It's extremely difficult for the agent to conclude that it will make any particular choice, because any proof of that (starting from your assumptions or any other assumptions) must also prove that the agent won't stumble on any other proofs that lead to yet higher outcomes.

I.e., that the agent won't find (contradictory) proofs that the same actions will lead to different, even higher utilities. Right, thanks.

Comment author: Vladimir_Nesov 14 August 2010 02:55:16PM 0 points [-]

(The following sounds too confusing, but making it clearer doesn't seem straightforward. Posting just in case communication succeeds on some level.)

Payoffs are part of preference, preference is (aspect of) world program, and world program determines the outcome. By changing the payoffs, you change the world program, and where decision-making is based on logical inferences about it, you can thus fundamentally change the shape of possible inferences. "Nice" solutions mean parametrization of the solution by elements of the problem statement, but here parametrization is bound to talk about what can be logically inferred, so expect no nice solutions.

Comment author: Vladimir_Nesov 14 August 2010 05:05:37PM 0 points [-]

Suppose there's a short proof that agent1()=="C" iff agent2()=="cooperate". [...] Therefore, under the supposition, agent1 will choose "C"; similarly, agent2 will choose "cooperate".

Sounds right (but it's a rather strong supposition).

Comment author: Benja 14 August 2010 05:52:22PM 0 points [-]

Sorry, I wasn't being as clear as I should have been:

The above is true as a metatheorem, but uninteresting, because as you say, the supposition is rather strong.

What I was trying to argue is that in the agent's proof system (for concreteness, say PA), when assuming the supposition, the conclusion follows, so by the deduction theorem, PA proves "the supposition implies the conclusion". I wanted to use this for a proof-size-bounded Löbian argument à la cousin_it.

But this does not work because in PA, the conclusion does not follow when assuming the supposition, as pointed out by cousin_it.

Comment author: atucker 13 August 2010 01:13:50AM *  0 points [-]

Would I be right in thinking that this implies that you can't apply could to world()? i.e. "Our universe could have a green sky" wouldn't be meaningful without some sort of metaverse program that references what we would think is the normal world program?

Or have the utility computed also depend on some set of facts about the universe?

I think both of those would require that the agent have some way of determining the facts about the universe (I suppose they could figure it out from the source code, but that seems sort of illegitimate to me).

Comment author: cousin_it 13 August 2010 05:06:22AM *  0 points [-]

This post only gives a way to apply "could" to yourself. This doesn't imply you can never apply "could" to the world, we might find another way to do that someday.

Comment author: atucker 14 August 2010 05:02:41PM 1 point [-]

It seems like there are two kinds of "could" at work here, one that applies to yourself and is based on consistent action to utility relationships, and another that involves uncertainty as to what actions cause what utilities (based on counterfactuals about the universe).

Comment author: cousin_it 14 August 2010 05:26:04PM 0 points [-]

Thanks, good point about uncertainty. I'm making a mental note to see how it relates to counterfactuals.

Comment author: Vladimir_Nesov 12 August 2010 06:51:15PM *  0 points [-]

Following is a generalization to take into account logical uncertainty (inability to prove difficult statements, which can lead in particular to inability to prove any statements of the necessary kind).

Instead of proving statements of the form "agent()==A implies world()==U", let's prove statements of the form "(X and agent()==A) implies world()==U", where X are unsubstantiated assumptions. There are obviously provable statements of the new form. Now, instead of looking for a statement with highest U, let's for each A look at the sets of pairs (X,U) for proved statements. We can now use the inverse of length of X to weight the corresponding utilities and estimate the "expected utility" of performing action A by summing over all explored X, even if we are unable to complete a single proof of the statements of the original form. For the same reason as in the post, this "expected utility" calculation is valid for the action that we choose.

(This looks like a potentially dangerous algorithm.)

Edit: As cousin_it points out, it doesn't quite work as intended, because X could just be "false", which would enable arbitrarily high U and collapse the "expected utility" calculation.

Comment author: Johnicholas 13 August 2010 03:48:39AM 1 point [-]

If you sum over all explored X (and implicitly treat unexplored X as if they are zero probability) then you can be arbitrarily far from optimal Bayesian behavior.

See Fitelson's "Bayesians sometimes cannot ignore even very implausible theories."

http://fitelson.org/hiti.pdf

Comment author: cousin_it 13 August 2010 05:37:25AM *  0 points [-]

Thanks for that link. Good stuff.

Comment author: cousin_it 12 August 2010 06:56:05PM *  3 points [-]

Ouch. X could simply say "false" and the whole thing explodes in your face.

Comment author: Johnicholas 13 August 2010 03:41:36AM 1 point [-]

There are logics (relevant or paraconsistent) logics which do not have the classical "ex falso quodlibet".

Comment author: Vladimir_Nesov 12 August 2010 07:15:36PM 0 points [-]

True, didn't notice that. I still feel there must be a workaround.

Comment author: atucker 14 August 2010 07:56:10PM 0 points [-]

What if instead of assuming your Xs, you got them out of the program?

To simplify, imagine a universe with only one fact. A function fact() returns 1 if that fact is true, and 0 if its not.

Now, agent can prove statements of the form "fact()==1 and agent()==A implies world()==U". This avoids the problem of plugging in false, agent doesn't start from assuming X as a statement of its own from which it can derive things, it looks at all the possible outputs of fact(), then sees how that combines with its action to produce utility.

Trivially, if agent has access to the code of fact(), then it can just figure out what that result would be, and it would know what actions correspond to what utilities.

Otherwise, agent() could use a prior distribution or magically infer some probability distribution of if fact is true or not, then choose its action based on normal expected utility calculations.

Note: The above departs from the structure of the world() function given here in that it assumes some way of interacting with world (or its source code) to find out fact() without actually getting the code of fact. Perhaps world is called multiple times, and agent() can observe the utility it accumulates and use that (combined with its "agent()==A and fact()==1 implies world() == U" proofs) to figure out if fact is true or not?

Interestingly, while this allows agent to apply "could" to the world ("It could rain tomorrow"), it doesn't allow the agent to apply "could" to things that don't effect world's utility calculation ("People could have nonmaterial souls that can be removed without influencing their behavior in any way").