Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Ingredients of Timeless Decision Theory

44 Post author: Eliezer_Yudkowsky 19 August 2009 01:10AM

Followup toNewcomb's Problem and Regret of Rationality, Towards a New Decision Theory

Wei Dai asked:

"Why didn't you mention earlier that your timeless decision theory mainly had to do with logical uncertainty? It would have saved people a lot of time trying to guess what you were talking about."

...

All right, fine, here's a fast summary of the most important ingredients that go into my "timeless decision theory".  This isn't so much an explanation of TDT, as a list of starting ideas that you could use to recreate TDT given sufficient background knowledge.  It seems to me that this sort of thing really takes a mini-book, but perhaps I shall be proven wrong.

The one-sentence version is:  Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.

The three-sentence version is:  Factor your uncertainty over (impossible) possible worlds into a causal graph that includes nodes corresponding to the unknown outputs of known computations; condition on the known initial conditions of your decision computation to screen off factors influencing the decision-setup; compute the counterfactuals in your expected utility formula by surgery on the node representing the logical output of that computation.

To obtain the background knowledge if you don't already have it, the two main things you'd need to study are the classical debates over Newcomblike problems, and the Judea Pearl synthesis of causality.  Canonical sources would be "Paradoxes of Rationality and Cooperation" for Newcomblike problems and "Causality" for causality.

For those of you who don't condescend to buy physical books, Marion Ledwig's thesis on Newcomb's Problem is a good summary of the existing attempts at decision theories, evidential decision theory and causal decision theory.  You need to know that causal decision theories two-box on Newcomb's Problem (which loses) and that evidential decision theories refrain from smoking on the smoking lesion problem (which is even crazier).  You need to know that the expected utility formula is actually over a counterfactual on our actions, rather than an ordinary probability update on our actions.

I'm not sure what you'd use for online reading on causality.  Mainly you need to know:

  • That a causal graph factorizes a correlated probability distribution into a deterministic mechanism of chained functions plus a set of uncorrelated unknowns as background factors.
  • Standard ideas about "screening off" variables (D-separation).
  • The standard way of computing counterfactuals (through surgery on causal graphs).

It will be helpful to have the standard Less Wrong background of defining rationality in terms of processes that systematically discover truths or achieve preferred outcomes, rather than processes that sound reasonable; understanding that you are embedded within physics; understanding that your philosophical intutions are how some particular cognitive algorithm feels from inside; and so on.


The first lemma is that a factorized probability distribution which includes logical uncertainty - uncertainty about the unknown output of known computations - appears to need cause-like nodes corresponding to this uncertainty.

Suppose I have a calculator on Mars and a calculator on Venus.  Both calculators are set to compute 123 * 456.  Since you know their exact initial conditions - perhaps even their exact initial physical state - a standard reading of the causal graph would insist that any uncertainties we have about the output of the two calculators, should be uncorrelated.  (By standard D-separation; if you have observed all the ancestors of two nodes, but have not observed any common descendants, the two nodes should be independent.)  However, if I tell you that the calculator at Mars flashes "56,088" on its LED display screen, you will conclude that the Venus calculator's display is also flashing "56,088".  (And you will conclude this before any ray of light could communicate between the two events, too.)

If I was giving a long exposition I would go on about how if you have two envelopes originating on Earth and one goes to Mars and one goes to Venus, your conclusion about the one on Venus from observing the one on Mars does not of course indicate a faster-than-light physical event, but standard ideas about D-separation indicate that completely observing the initial state of the calculators ought to screen off any remaining uncertainty we have about their causal descendants so that the descendant nodes are uncorrelated, and the fact that they're still correlated indicates that there is a common unobserved factor, and this is our logical uncertainty about the result of the abstract computation.  I would also talk for a bit about how if there's a small random factor in the transistors, and we saw three calculators, and two showed 56,088 and one showed 56,086, we would probably treat these as likelihood messages going up from nodes descending from the "Platonic" node standing for the ideal result of the computation - in short, it looks like our uncertainty about the unknown logical results of known computations, really does behave like a standard causal node from which the physical results descend as child nodes.

But this is a short exposition, so you can fill in that sort of thing yourself, if you like.

Having realized that our causal graphs contain nodes corresponding to logical uncertainties / the ideal result of Platonic computations, we next construe the counterfactuals of our expected utility formula to be counterfactuals over the logical result of the abstract computation corresponding to the expected utility calculation, rather than counterfactuals over any particular physical node.

You treat your choice as determining the result of the logical computation, and hence all instantiations of that computation, and all instantiations of other computations dependent on that logical computation.

Formally you'd use a Godelian diagonal to write:

Argmax[A in Actions] in Sum[O in Outcomes](Utility(O)*P(this computation yields A []-> O|rest of universe))

(where P( X=x []-> Y | Z ) means computing the counterfactual on the factored causal graph P, that surgically setting node X to x, leads to Y, given Z)

Setting this up correctly (in accordance with standard constraints on causal graphs, like noncircularity) will solve (yield reflectively consistent, epistemically intuitive, systematically winning answers to) 95% of the Newcomblike problems in the literature I've seen, including Newcomb's Problem and other problems causing CDT to lose, the Smoking Lesion and other problems causing EDT to fail, Parfit's Hitchhiker which causes both CDT and EDT to lose, etc.

Note that this does not solve the remaining open problems in TDT (though Nesov and Dai may have solved one such problem with their updateless decision theory).  Also, although this theory goes into much more detail about how to compute its counterfactuals than classical CDT, there are still some visible incompletenesses when it comes to generating causal graphs that include the uncertain results of computations, computations dependent on other computations, computations uncertainly correlated to other computations, computations that reason abstractly about other computations without simulating them exactly, and so on.  On the other hand, CDT just has the entire counterfactual distribution rain down on the theory as mana from heaven (e.g. James Joyce, Foundations of Causal Decision Theory), so TDT is at least an improvement; and standard classical logic and standard causal graphs offer quite a lot of pre-existing structure here.  (In general, understanding the causal structure of reality is an AI-complete problem, and so in philosophical dilemmas the causal structure of the problem is implicitly given in the story description.)

Among the many other things I am skipping over:

  • Some actual examples of where CDT loses and TDT wins, EDT loses and TDT wins, both lose and TDT wins, what I mean by "setting up the causal graph correctly" and some potential pitfalls to avoid, etc.
  • A rather huge amount of reasoning which defines reflective consistency on a problem class; explains why reflective consistency is a rather strong desideratum for self-modifying AI; why the need to make "precommitments" is an expensive retreat to second-best and shows lack of reflective consistency; explains why it is desirable to win and get lots of money rather than just be "reasonable" (that is conform to pre-existing intuitions generated by a pre-existing algorithm); which notes that, considering the many pleas from people who want, but can't find any good intermediate stage between CDT and EDT, it's a fascinating little fact that if you were rewriting your own source code, you'd rewrite it to one-box on Newcomb's Problem and smoke on the smoking lesion problem...
  • ...and so, having given many considerations of desirability in a decision theory, shows that the behavior of TDT corresponds to reflective consistency on a problem class in which your payoff is determined by the type of decision you make, but not sensitive to the exact algorithm you use apart from that - that TDT is the compact way of computing this desirable behavior we have previously defined in terms of reflectively consistent systematic winning.
  • Showing that classical CDT, given self-modification ability, modifies into a crippled and inelegant form of TDT.
  • Using TDT to fix the non-naturalistic behavior of Pearl's version of classical causality in which we're supposed to pretend that our actions are divorced from the rest of the universe - the counterfactual surgery, written out Pearl's way, will actually give poor predictions for some problems (like someone who two-boxes on Newcomb's Problem and believes that box B has a base-rate probability of containing a million dollars, because the counterfactual surgery says that box B's contents have to be independent of the action).  TDT not only gives the correct prediction, but explains why the counterfactual surgery can have the form it does - if you condition on the initial state of the computation, this should screen off all the information you could get about outside things that affect your decision; then your actual output can be further determined only by the Godel-diagonal formula written out above, permitting the formula to contain a counterfactual surgery that assumes its own output, so that the formula does not need to infinitely recurse on calling itself.
  • An account of some brief ad-hoc experiments I performed on IRC to show that a majority of respondents exhibited a decision pattern best explained by TDT rather than EDT or CDT.
  • A rather huge amount of exposition of what TDT decision theory actually corresponds to in terms of philosophical intuitions, especially those about "free will".  For example, this is the theory I was using as hidden background when I wrote in "Causality and Moral Responsibility" that factors like education and upbringing can be thought of as determining which person makes a decision - that you rather than someone else makes a decision - but that the decision made by that particular person is up to you.  This corresponds to conditioning on the known initial state of the computation, and performing the counterfactual surgery over its output.  I've actually done a lot of this exposition on OBLW without explicitly mentioning TDT, like Timeless Control and Thou Art Physics for reconciling determinism with choice (actually effective choice requires determinism, but this confuses humans for reasons given in Possibility and Could-ness).  But if you read the other parts of the solution to "free will", and then furthermore explicitly formulate TDT, then this is what utterly, finally, completely, and without even a tiny trace of confusion or dissatisfaction or a sense of lingering questions, kills off entirely the question of "free will".
  • Some concluding chiding of those philosophers who blithely decided that the "rational" course of action systematically loses; that rationalists defect on the Prisoner's Dilemma and hence we need a separate concept of "social rationality"; that the "reasonable" thing to do is determined by consulting pre-existing intuitions of reasonableness, rather than first looking at which agents walk away with huge heaps of money and then working out how to do it systematically; people who take their intuitions about free will at face value; assuming that counterfactuals are fixed givens raining down from the sky rather than non-observable constructs which we can construe in whatever way generates a winning decision theory; et cetera.  And celebrating of the fact that rationalists can cooperate with each other, vote in elections, and do many other nice things that philosophers have claimed they can't.  And suggesting that perhaps next time one should extend "rationality" a bit more credit before sighing and nodding wisely about its limitations.
  • In conclusion, rational agents are not incapable of cooperation, rational agents are not constantly fighting their own source code, rational agents do not go around helplessly wishing they were less rational, and finally, rational agents win.

Those of you who've read the quantum mechanics sequence can extrapolate from past experience that I'm not bluffing.  But it's not clear to me that writing this book would be my best possible expenditure of the required time.

Comments (229)

Comment author: Gary_Drescher 20 August 2009 01:44:16PM *  23 points [-]

This is very cool, and I haven't digested it yet, but I wonder if it might be open to the criticism that you're effectively postulating the favored answer to Newcomb's Problem (and other such scenarios) by postulating that when you surgically alter one of the nodes, you correspondingly alter the nodes for the other instances of the computation. After all, the crux of the counterfactual-reasoning dilemma in Newcomb's Problem (and similarly in the Prisoner's Dilemma) is to jusftify the inference "If I choose both boxes, then (probably) so does the simulation (even if in fact I/it do not)" rather than "If I choose both boxes, then the simulation doesn't necessarily match my choice (even though in fact it does)". It could be objected that your formalism postulates the desired answer rather than giving a basis for deriving it--an objection that becomes more important when we move away from identical or functionally equivalent source code and start to consider approximate similarities. (See my criticism of Leslie (1991)'s proposal that you should make your choice as though you were also choosing on behalf of other agents of similar causal structure. If I'm not mistaken, your proposal seems to be a formalization of that idea.)

Here's an alternative proposal.

Metacircular Decision Theory (MCDT)

For purposes of this discussion, let me just stipulate that subjective probabilities will be modeled as though they were quantum under MWI--that is, we'll regard the entire distribution as part of the universe. That move will help with dual-simulation/counterfactual-mugging scenarios; but also, as I argued in Good and Real, we effectively make that move whenever we assign value to probabilistic outcomes even in nonesoteric situations (so we may as well avail ourselves of that move in the weird scenarios too, though eventually we need to justify the move).

Say we have an agent embodied in the universe. The agent knows some facts about the universe (including itself), has an inference system of some sort for expanding on those facts, and has a preference scheme that assigns a value to the set of facts, and is wired to select an action--specifically, the/an action that implies (using its inference system) the/a most-preferred set of facts.

But without further constraint, this process often leads to a contradiction. Suppose the agent's repertoire of actions is A1, ...An, and the value of action Ai is simply i. Say the agent starts by considering the action A7, and dutifully evaluates it as 7. Next, it contemplates the action A6, and reasons as follows: "Suppose I choose A6. I know I'm a utility-maximizing agent, and I already know there's another choice that has value 7. Therefore, if follows from my (hypothetical) choice of A6 that A6 has a value of at least 7." But that inference, while sound, contradicts the fact that A6's value is 6.

Unsurprisingly, a false premise leads to a contradiction. To avoid contradiction, we need to limit the set of facts that the agent is allowed to reason from when making inferences about a hypothetical action. But which facts do we omit? Different choices yield different preferred actions. If we omit the fact that val(A6)=6, then we can infer val(A6)>=7; if instead we omit the fact that the agent utility-maximizes, then we can infer val(A6)=6 without contradiction (or at least without the particular contradiction above).

So this is the usual full-blown problem of counterfactual inference: which things do we "hold fixed" when contemplating a counterfactual antecedent, and which do we "let vary" for consistency with that antecedent? Different choices here correspond to different decision theories. If the agent allows inferences (only) from all facts about physical law as applied to the future, and all facts about the past and present universe-state, except for facts about the agent's internal decision-making state, then we get CDT. If we leave the criteria unspecified/ambiguous, we get EDT. If we allow the agent to reason from facts about the future as well as the past and present, we get FDT (Fatalist Decision Theory: choice is futile, which most people think follows from determinism).

MCDT's proposed criterion is this: the agent makes a meta-choice about which facts to omit when making inferences about the hypothetical actions, and selects the set of facts which lead to the best outcome if the agent then evaluates the original candidate actions with respect to that choice of facts. The agent then iterates that meta-evaluation as needed (probably not very far) until a fixed point is reached, i.e. the same choice (as to which facts to omit) leaves the first-order choice unchanged. (It's ok if that's intractable or uncomputable; the agent can muddle through with some approximate algorithm.)

EDIT1: The algorithm also needs to check, when it evaluates a meta-level choice candidate, that the winning choice at the next level down is consistent with all known facts. If not, the meta-level candidate is eliminated from consideration. (Otherwise, the A6 choice could remain stable in the example above.)

EDIT2: Or rather, that consistency check can probably substitute for the additional meta-iterations.

So e.g. in Newcomb's Problem or the Prisoner's Dilemma, the agent can calculate that it does better if it retains the fact that its dispositional-state/source-code is functionally equivalent to the simulation's/other's (but omits facts about which particular choice is made by both) than if it makes the CDT choice and omits the fact about equivalence, but keeps the facts about the simulation's/other's choice (or keeps some probability distribution about the simulation's/other's choice).

In other words, metacircular consistency isn't just a test that we'd like the decision theory to pass. Metacircular consistency is the theory; it is the algorithm.

Comment author: Eliezer_Yudkowsky 20 August 2009 10:08:40PM 2 points [-]
Comment author: Gary_Drescher 20 August 2009 02:51:54PM 1 point [-]

To clarify: the agent in MCDT is a particular physical instantiation, rather than being timeless/Platonic (well, except insofar as physics itself is Platonic).

Comment author: Wei_Dai 19 August 2009 07:08:23AM *  19 points [-]

Today I finally came up with a simple example where TDT clearly loses and CDT clearly wins, and as a bonus, proves that TDT isn't reflectively consistent.

Omega comes to you and says

I'm hosting a game with 3 players. Two players are AIs I created running TDT but not capable of self-modification, one being a paperclip maximizer, the other being a staples maximizer. The last player is an AI you will design. When the game starts, my two AIs will first get the source code of your AI (which is only fair since you know the design of my AIs). Then 2 of the 3 players will be chosen randomly to play a one-shot true PD, without knowing who they are facing. What AI do you submit?

Say the payoffs of the PD are

  • 5/5 0/6
  • 6/0 1/1

Suppose you submit an AI running CDT. Then, Omega's AIs will reason as follows: "I have 1/2 chance of playing against a TDT, and 1/2 chance of playing against a CDT. If I play C, then my opponent will play C if it's a TDT, and D if it's a CDT, therefore my expected payoff is 5/2+0/2=2.5. If I play D, then my opponent will play D, so my payoff is 1. Therefore I should play C." Your AI then gets a payoff of 6, since it will play D.

Suppose you submit an AI running TDT instead. Then everyone will play C, so your AI will get a payoff of 5.

So you submit a CDT, whether you are running CDT or TDT. That's because explicitly giving the source code of your submitted AI to the other AIs makes the consequences of your decision the same under CDT and under TDT.

Suppose you have to play this game yourself instead of delegating it, you can self-modify, and the payoffs are large enough, you'd modify yourself from running TDT to running some other DT that plays D in this game! (Notice that I specified that Omega's AIs can't self-modify, so your decision to self-modify won't have the logical consequence that they also self-modify.)

It seems that I've given a counter-example to the claim that

the behavior of TDT corresponds to reflective consistency on a problem class in which your payoff is determined by the type of decision you make, but not sensitive to the exact algorithm you use apart from that

Or does my example fall outside of the specified problem class?

Comment author: Eliezer_Yudkowsky 19 August 2009 08:01:41AM 10 points [-]

Or does my example fall outside of the specified problem class?

If I wanted to defend the original thesis, I would say yes, because TDT doesn't cooperate or defect depending directly on your decision, but cooperates or defects depending on how your decision depends on its decision (which was one of the open problems I listed - the original TDT is for cases where Omega offers you straightforward dilemmas in which its behavior is just a direct transform of your behavior). So where one algorithm has one payoff matrix for defection or cooperation, the other algorithm gets a different payoff matrix for defection or cooperation, which breaks the "problem class" under which the original TDT is automatically reflectively consistent.

Nonetheless it's certainly an interesting dilemma.

Your comment here is actually pre-empting a comment that I'd planned to make after providing some of the background for the content of TDT. I'd thought about your dilemmas, and then did manage to translate into my terms a notion about how it might be possible to unilaterally defect in the Prisoner's Dilemma and predictably get away with it, provided you did so for unusual reasons. But the conditions on "unusual reasons" are much more difficult than your posts seem to imply. We can't all act on unusual reasons and end up doing the same thing, after all. How is it that these two TDT AIs got here, if not by act of Omega, if the sensible thing to do is always to submit a CDT AI?

To introduce yet another complication: What if the TDTs that you're playing against, decide to defect unconditionally if you submit a CDT player, in order to give you an incentive to submit a TDT player? Given that your reason for submitting a CDT player involves your expectation about how the TDT players will respond, and that you can "get away with it"? It's the TDT's responses that make them "exploitable" by your decision to submit a CDT player - so what if they employ a different strategy instead? (This is another open problem - "who acts first" in timeless negotiations.)

There might be a certain sense in which being in a "small subgroup internally correlated but not correlated with larger groups" could possibly act as a sort of resource for getting away with defection in the true PD, because if you're in a large group then defecting shifts the probability of an opponent likewise defecting by a lot, but if you're in a small subgroup then it shifts the probability of the opponent defecting by a little, so there's a lower penalty for defection, so in marginal cases a small subgroup might play defection while a large subgroup plays cooperate. (But again, the conditions on this are difficult. If all small subgroups reason this way, then all small subgroups form a large correlated group!)

Anyway - you can't end up in a small subgroup if you start out in a large one, because if you decide to deliberately condition on noise in order to decrease the size of your subgroup, that itself is a correlated sort of decision with a clear line of reasoning and motive, and others in your correlated group will try doing the same thing, with predictable results. So to the extent that lots of AI designers in distant parts of Reality are discussing this same issue with the same logic, we are already in a group of a certain minimum size.

But this does lead to an argument for CEV (values extrapolating / Friendly AI) algorithms that don't automatically, inherently correlate us with larger groups than we already started out being in. If uncorrelation is a nonrenewable resource then FAI programmers should at least be careful not to wantonly burn it. You can't deliberately add noise, but you might be able to preserve existing uncorrelation.

Also, other TDTs can potentially set their "minimum cooperator frequency threshold" at just the right level that if any group of noticeable size chooses to defect, all the TDTs start defecting - though this itself is a possibility I am highly unsure of, and once again it has to do with "who goes first" in timeless strategies, which is an open problem.

But these are issues in which my understanding is still shaky, and it very rapidly gets us into very dangerous territory like trying to throw the steering wheel out the window while playing chicken.

So far as evolved biological organisms go, I suspect that the ones who create successful Friendly AIs (instead of losing control and dying at the hands of paperclip maximizers), would hardly start out seeing only the view from CDT - most of them/us would be making the decision "Should I build TDT, knowing that the decisions of other biological civilizations are correlated to this one?" and not "Should I build TDT, having never thought of that?" In other words, we may already be part of a large correlated subgroup - though I sometimes suspect that most of the AIs out there are paperclip maximizers born of experimental accidents, and in that case, if there is no way of verifying source code, nor of telling the difference between SIs containing bio-values-preserving civs and SIs containing paperclip maximizers, then we might be able to exploit the relative smallness of the "successful biological designer" group...

...but a lot of this presently has the quality of "No fucking way would I try that in real life", at least based on my current understanding. The closest I would get might be trying for a CEV algorithm that did not inherently add correlation to decision systems with which we were not already correlated.

Comment author: Wei_Dai 19 August 2009 12:42:04PM *  7 points [-]

This is another open problem - "who acts first" in timeless negotiations.

You're right, I failed to realize that with timeless agents, we can't do backwards induction using the physical order of decisions. We need some notion of the logical order of decisions.

Here's an idea. The logical order of decisions is related to simulation ability. Suppose A can simulate B, meaning it has trustworthy information about B's source code and has sufficient computing power to fully simulate B or sufficient intelligence to analyze B using reliable shortcuts, but B can't simulate A. Then the logical order of decisions is B followed by A, because when B makes his decision, he can treat A's decision as conditional on his. But when A makes her decision, she has to take B's decision as a given.

Does that make sense?

Comment author: Eliezer_Yudkowsky 19 August 2009 03:05:15PM *  7 points [-]

Moving second is a disadvantage (at least it seems to always work out that way, counterexamples requested if you can find them) and A can always use less computing power. Rational agents should not regret having more computing power (because they can always use less) or more knowledge (because they can always implement the same strategy they would use with less knowledge) - this sort of thing is a sure sign of reflective inconsistency.

To see why moving logically second is a disadvantage, consider that it lets an opponent playing Chicken always toss their steering wheel out the window and get away with it.

That both players desire to move "logically first" argues strongly that neither one will; that the resolution here does not involve any particular fixed global logical order of decisions.

(I should comment in the future about the possibility that bio-values-derived civs, by virtue of having evolved to be crazy, can succeed in moving logically first using crazy reasoning, but that would be a whole 'nother story, and of course also falls into the "Way the fuck too dangerous to try in real life" category relative to my present knowledge.)

With timeless agents, we can't do backwards induction using the physical order of decisions. We need some notion of the logical order of decisions.

BTW, thanks for this compact way of putting it.

Comment author: rwallace 19 August 2009 07:33:55PM 1 point [-]

Being logically second only keeps being a disadvantage because examples keep being chosen to be of the kind that make it so.

One category of counterexample comes from warfare, where if you know what the enemy will do and he doesn't know what you will do, you have the upper hand. (The logical versus temporal distinction is clear here: being temporally the first to reach an objective can be a big advantage.)

Another counterexample is in negotiation where a buyer and seller are both uncertain about fair market price; each may prefer the other to be first to suggest a price. (In practice this is often resolved by the party with more knowledge, or more at stake, or both - usually the seller - being first to suggest a price.)

Comment author: Wei_Dai 20 August 2009 12:18:02AM 0 points [-]

Being logically second only keeps being a disadvantage because examples keep being chosen to be of the kind that make it so.

You're right. Rock-paper-scissors is another counter-example. In these cases, the relationship between between the logical order of moves and simulation ability seems pretty obvious and intuitive.

Comment author: Eliezer_Yudkowsky 20 August 2009 12:19:37AM 1 point [-]

Except that the analogy to rock-paper-scissors would be that I get to move logically first by deciding my conditional strategy "rock if you play scissors" etc., and simulating you simulating me without running into an apparently non-halting computation (that would otherwise have to be stopped by my performing counterfactual surgery on the part of you that simulates my own decision), then playing rock if I simulate you playing scissors.

At least I think that's how the analogy would work.

Comment author: Vladimir_Nesov 20 August 2009 12:36:21AM *  2 points [-]

I suspect that this kind of problems will run into computational complexity issues, not clever decision theory issues. Like with a certain variation on St. Petersburg paradox (see the last two paragraphs), where you need to count to the greatest finite number to which you can count, and then stop.

Comment author: Wei_Dai 20 August 2009 12:29:38AM 1 point [-]

Suppose I know that's your strategy, and decide to play the move equal to (the first googleplex digits of pi mod 3), and I can actually compute that but you can't. What are you going to do?

If you can predict what I do, then your conditional strategy works, which just shows that move order is related to simulation ability.

Comment author: Eliezer_Yudkowsky 20 August 2009 03:32:21AM *  4 points [-]

In this zero-sum game, yes, it's possible that whoever has the most computing power wins, if neither can access unpredictable random or private variables. But what if both sides have exactly equal computing power? We could define a Timeless Paper-Scissors-Rock Tournament this way - standard language, no random function, each program gets access to the other's source code and exactly 100 million ticks, if you halt without outputting a move then you lose 2 points.

Comment author: Wei_Dai 20 August 2009 09:13:40AM 1 point [-]

This game is pretty easy to solve, I think. A simple equilibrium is for each side to do something like iterate x = SHA-512(x), with a random starting value, using an optimal implementation of SHA-512, until time is just about to run out, then output x mod 3. SHA-512 is easy to optimize (in the sense of writing the absolutely fastest implementation), and It seems very unlikely that there could be shortcuts to computing (SHA-512)^n until n gets so big (around 2^256 unless SHA-512 is badly designed) that the function starts to cycle.

I think I've answered your specific question, but the answer doesn't seem that interesting, and I'm not sure why you asked it.

Comment author: Wei_Dai 19 August 2009 07:31:16PM *  1 point [-]

Moving second is a disadvantage (at least it seems to always work out that way, counterexamples requested if you can find them) and A can always use less computing power.

But if you are TDT, you can't always use less computing power, because that might be correlated with your opponents also deciding to use less computing power, or will be distrusted by your opponent because it can't simulate you.

But if you simply don't have that much computing power (and opponent knows this) then you seem to have the advantage of logically moving first.

(I should comment in the future about the possibility that bio-values-derived civs, by virtue of having evolved to be crazy, can succeed in moving logically first using crazy reasoning, but that would be a whole 'nother story, and of course also falls into the "Way the fuck too dangerous to try in real life" category relative to my present knowledge.)

Lack of computing power could be considered a form of "crazy reasoning"...

Why does TDT lead to the phenomenon of "stupid winners"? If there's a way to explain this as a reasonable outcome, I'd feel a lot better. But is that like a two-boxer asking for an explanation of why, when the stupid (from their perspective) one-boxers keep winning, that's a reasonable outcome?

Comment author: Eliezer_Yudkowsky 19 August 2009 07:55:42PM *  0 points [-]

But if you are TDT, you can't always use less computing power, because that correlates with your opponents also deciding to use less computing power.

Substitute "move logically first" for "use less computing power"? Using less computing power seems like a red herring to me. TDT on simple problems (with the causal / logical structure already given) uses skeletally small amounts of computing power. "Who moves first" is a "battle"(?) over the causal / logical structure, not over who can manage to run out of computing power first. If you're visualizing this using lots of computing power for the core logic, rather than computing the 20th decimal place of some threshold or verifying large proofs, then we've got different visualizations.

The idea of "if you do this, the opponent does the same" might apply to trying to move logically first, but in my world this has nothing to do with computing power, so at this point I think it'd be pretty odd if the agents were competing to be stupider.

Besides, you don't want to respond to most logical threats, because that gives your opponent an incentive to make logical threats; you only want to respond to logical offers that you want your opponent to have an incentive to make. This gets into the scary issues I was hinting at before, like determining in advance that if you see your opponent predetermine to destroy the universe in a mutual suicide unless you pay a ransom, you'll call their bet and die with them, even if they've predetermined to ignore your decision, etcetera; but if they offer to trade you silver for gold at a Ricardian-advantageous rate, you'll predetermine to cooperate, etc. The point, though, is that "If I do X, they'll do Y" is not a blank check to decide that minds do X, because you could choose a different form of responsiveness.

But anyway, I don't see in the first place that agents should be having these sorts of contests over how little computing power to use. That doesn't seem to me like a compelling advantage to reach for.

But if you simply don't have that much computing power then you seem to have the advantage of logically moving first.

If you've got that little computing power then perhaps you can't simulate your opponent's skeletally small TDT decision, i.e., you can't use TDT at all. If you can't close the loop of "I simulate you simulating me" - which isn't infinite, and actually terminates rather quickly in the simple cases I know how to analyze at all, because we perform counterfactual surgery inside the loop - then you can't use TDT at all.

Lack of computing power could be considered a form of "crazy reasoning"...

No, I mean much crazier than that. Like "This doesn't follow, but I'm going to believe it anyway!" That's what it takes to get "unusual reasons" - the sort of madness that only strictly naturally selected biological minds would find compelling in advance of a timeless decision to be crazy. Like "I'M GOING TO THROW THE STEERING WINDOW OUT THE WHEEL AND I DON'T CARE WHAT THE OPPONENT PREDETERMINES" crazy.

Why does TDT lead to the phenomenon of "stupid winners"?

It has not been established to my satisfaction that it does. It is a central philosophical intuition driving my decision theory that increased computing power, knowledge, or self-control, should not harm a rational agent.

Comment author: Eliezer_Yudkowsky 20 August 2009 12:11:18AM 0 points [-]

That both players desire to move "logically first" argues strongly that neither one will; that the resolution here does not involve any particular fixed global logical order of decisions.

...possibly employing mixed strategies, by analogy to the equilibrium of games where neither agent gets to go first and both must choose simultaneously? But I haven't done anything with this idea, yet.

Comment author: [deleted] 13 June 2014 06:52:49AM -1 points [-]

This reminds me of logical Fatalism and the Argument from Bivalence

Comment author: cousin_it 20 June 2013 03:47:52PM *  2 points [-]

What if the TDTs that you're playing against, decide to defect unconditionally if you submit a CDT player, in order to give you an incentive to submit a TDT player?

That's a good point, but what if the process that gives birth to CDT doesn't listen to the incentives you give it? For example, it could be evolution or random chance.

Here's an example, similar to Wei's example above. Imagine two parallel universes, both containing large populations of TDT agents. In both universes, a child is born, looking exactly like everyone else. The child in universe A is a TDT agent named Alice. The child in universe B is named Bob and has a random mutation that makes him use CDT. Both children go on to play many blind PDs with their neighbors. It looks like Bob's life will be much happier than Alice's, right?

We can't all act on unusual reasons and end up doing the same thing, after all.

What force will push against evolution and keep the number of Bobs small?

Comment author: Vladimir_Nesov 19 August 2009 11:22:47AM *  2 points [-]

The problem is that "source code of your AI" is not a complete story, since your decisions as AI programmer also depended on the Omega AIs' code, and so what you give as the source of AI is already only one of the possible worlds that presupposes the behavior of Omega AIs.

Comment author: Wei_Dai 19 August 2009 12:33:19PM 1 point [-]

Yes, I think Eliezer made a similar point:

What if the TDTs that you're playing against, decide to defect unconditionally if you submit a CDT player, in order to give you an incentive to submit a TDT player?

So if you run TDT, then there are at least two equilibria in this game, only one of which involves you submitting a CDT. Can you think of a way to select between these two equilibria?

If not, I can fix this by changing the game a bit. Omega will now create his TDT AIs after you design yours, and hard code the source code of your AI into it as givens. His AIs won't even know about you, the real player.

Comment author: Eliezer_Yudkowsky 19 August 2009 02:47:20PM *  5 points [-]

Omega will now create his TDT AIs after you design yours, and hard code the source code of your AI into it as givens. His AIs won't even know about you, the real player.

They might simply infer you, the real player. You might as well tell the TDT AIs that they're up against a hardcoded Defect move as the "other player", but they won't know if that player has been selected. In fact, that pretty much is what you're telling them, if you show them a CDT player. The CDT player is a red herring - the decision to defect was made by you, in the moment of submitting a CDT player. There is no law against TDT players realizing this after Omega codes them.

I should note that in matters such as these, the phrase "hard code" should act as a warning sign that you're trying to fix something that, at least in your own mind, doesn't want to be fixed. (E.g. "hard code obedience into AIs, build it into the very circuitry!") Where you are tempted to say "hard code" you may just need to accept whatever complex burden you were trying to get rid of by saying "fix it in place with codes of iron!"

Comment author: Wei_Dai 19 August 2009 09:03:06PM *  1 point [-]

By hard code, I meant code it into the TDT's probability distribution. (Even TDT isn't meta enough to say "My prior is wrong!") But that does make the example less convincing, so let me try something else.

Have Omega's AIs physically go first and you play for yourself. They get a copy of your source code, then make their moves in the 3-choose-2 PD game first. You learn their move, then make your choice. Now, if you follow CDT, you'll reason that your decision has no causal effect on the TDT's decisions, and therefore choose D. The TDTs, knowing this, will play C.

And I think I can still show that if you run TDT, you will decide to self-modify into CDT before starting this game. First, if Omega's AIs know that you run TDT at the beginning, then they can use that "play D if you self-modify" strategy to deter you from self-modifying. But you can also use "I'll self-modify anyway" to deter them from doing that. So who wins this game? (If someone moves first logically, then he wins, but what if everyone moves simultaneously in the logical sense, which seems to be the case in this game?)

Suppose it's common knowledge that Omega mostly chooses CDT agents to participate in this game, then "play D if you self-modify" isn't very "credible". That's because they only see your source code after you self-modify so they'd have to play D if they predict that a TDT agent would self-modify, even if the actual player started with CDT. Given that, your "I'll self-modify anyway" would be highly credible.

I'm not sure how to formalize this notion of "credibility" among TDTs, but it seems to make intuitive sense.

Comment author: Eliezer_Yudkowsky 19 August 2009 09:37:37PM *  4 points [-]

And I think I can still show that if you run TDT, you will decide to self-modify into CDT before starting this game

Well that should never happen. Anything that would make a TDT want to self-modify into CDT should make it just want to play D, no need for self-modification. It should give the same answer at different times, that's what makes it a timeless decision theory. If you can break that without direct explicit dependence on the algorithm apart from its decisions, then I am in trouble! But it seems to me that I can substitute "play D" for "self-modify" in all cases above.

First, if Omega's AIs know that you run TDT at the beginning, then they can use that "play D if you self-modify" strategy to deter you from self-modifying.

E.g., "play D if you play D to deter you from playing D" seems like the same idea, the self-modification doesn't add anything.

So who wins this game? (If someone moves first logically, then he wins, but what if everyone moves simultaneously in the logical sense, which seems to be the case in this game?)

Well... it partially seems to me that, in assuming certain decisions are made without logical consequences - because you move logically first, or because the TDT agents have fixed wrong priors, etc. - you are trying to reduce the game to a Prisoner's Dilemma in which you have a certain chance of playing against a piece of cardboard with "D" written on it. Even a uniform population of TDTs may go on playing C in this case, of course, if the probability of facing cardboard is low enough. But by the same token, the fact that the cardboard sometimes "wins" does not make it smarter or more rational than the TDT agents.

Now, I want to be very careful about how I use this argument, because indeed a piece of cardboard with "only take box B" written on it, is smarter than CDT agents on Newcomb's Problem. But who writes that piece of cardboard, rather than a different one?

An authorless piece of cardboard genuinely does go logically first, but at the expense of being a piece of cardboard, which makes it unable to adapt to more complex situations. A true CDT agent goes logically first, but at the expense of losing on Newcomb's Problem. And your choice to put forth a piece of cardboard marked "D" relies on you expecting the TDT agents to make a certain response, which makes the claim that it's really just a piece of cardboard and therefore gets to go logically first, somewhat questionable.

Roughly, what I'm trying to reply is that you're reasoning about the response of the TDT agents to your choosing the CDT algorithm, which makes you TDT, but you're also trying to force your choice of the CDT algorithm to go logically first, but this is begging the question.

I would, perhaps, go so far as to agree that in an extension of TDT to cases in which certain agents magically get to go logically first, then if those agents are part of a small group uncorrelated with yet observationally indistinguishable from a large group, the small group might make a correlated decision to defect "no matter what" the large group does, knowing that the large group will decide to cooperate anyway given the payoff matrix. But the key assumption here is the ability to go logically first.

It seems to me that the incompleteness of my present theory when it comes to logical ordering is the real key issue here.

Comment author: Wei_Dai 19 August 2009 10:01:42PM *  1 point [-]

Well that should never happen. Anything that would make a TDT want to self-modify into CDT should make it just want to play D, no need for self-modification. It should give the same answer at different times, that's what makes it a timeless decision theory. If you can break that without direct explicit dependence on the algorithm apart from its decisions, then I am in trouble! But it seems to me that I can substitute "play D" for "self-modify" in all cases above.

The reason to self-modify is to make yourself indistinguishable from players who started as CDT agents, so that Omega's AIs can't condition their moves on the player's type. Remember that Omega's AIs get a copy of your source code.

A true CDT agent goes logically first, but at the expense of losing on Newcomb's Problem.

But a CDT agent would self-modify into something not losing on Newcomb's problem if it expects to face that. On the other hand, if TDT doesn't self-modify into something that wins my game, isn't that worse? (Is it better to be reflectively consistent, or winning, if you had to choose one?)

It seems to me that the incompleteness of my present theory when it comes to logical ordering is the real key issue here.

Yes, I agree that's a big piece of the puzzle, but I'm guessing the solution to that won't fully solve the "stupid winner" problem.

ETA: And for TDT agents that move simultaneously, there remains the problem of "bargaining" to use Nesov's term. Lots of unsolved problems... I wish you started us working on this stuff earlier!

Comment author: Vladimir_Nesov 19 August 2009 10:19:37PM *  1 point [-]

The reason to self-modify is to make yourself indistinguishable from players who started as CDT agents, so that Omega's AIs can't condition their moves on the player's type.

Being (or performing an action) indistinguishable from X doesn't protect you from the inference that X probably resulted from such a plot. That you can decide to camouflage like this may even reduce X's own credibility (and so a lot of platonic/possible agents doing that will make the configuration unattractive). Thus, the agents need to decide among themselves what to look like: first-mover configurations is a limited resource.

(This seems like a step towards solving bargaining.)

Comment author: Wei_Dai 19 August 2009 10:25:03PM 0 points [-]

Yes, I see that your comment does seem like a step towards solving bargaining among TDT agents. But I'm still trying to argue that if we're not TDT agents yet, maybe we don't want to become them. My comment was made in that context.

Comment author: Vladimir_Nesov 19 August 2009 10:47:32PM *  1 point [-]

Let's pick up Eliezer's suggestion and distinguish now-much-less-mysterious TDT from the different idea of "updateless decision theory", UDT, that describes choice of a whole strategy (function from states of knowledge to actions) rather than choice of actions in each given state of knowledge, of which latter class TDT is an example. TDT isn't a UDT, and UDT is a rather vacuous statement, as it only achieves reflective consistency pretty much by definition, but doesn't tell much about the structure of preference and how to choose the strategy.

I don't want to become a TDT agent, as in UDT sense, TDT agents aren't reflectively consistent. They could self-modify towards more UDT-ish look, but this is the same argument as with CDT self-modifying into a TDT.

Comment author: MichaelVassar 19 August 2009 05:08:04PM 0 points [-]

After all, for anything you can hard code, the AI can build a new AI that lacks your hard coding and sacrifice its resources to that new AI.

Comment author: RickJS 28 August 2009 03:38:11AM 0 points [-]

Wei_Dai wrote on 19 August 2009 07:08:23AM :

... Omega's AIs will reason as follows: "I have 1/2 chance of playing against a TDT, and 1/2 chance of playing against a CDT. If I play C, then my opponent will play C if it's a TDT, and D if it's a CDT ...

That seems to violate the secrecy assumptions of the Prisoner's Dilemma problem! I thought each prisoner has to commit to his action before learning what the other one did. What am I missing?

Thanks!

Comment author: Nick_Tarleton 19 August 2009 05:10:06PM 5 points [-]

Does this theory handle Drescher's example of raising my hand because I want the universe a billion years ago to be such that I would raise my hand a billion years hence?

Comment author: Eliezer_Yudkowsky 19 August 2009 08:10:40PM *  7 points [-]

Yes. That's a logical dependence.

ETA: To be exact, you have a fixed state a billion years ago, a computation which runs on that state to determine "Will you raise your hand a billion years hence?", and you can know the initial state without knowing the output of the function, but then determine that the function outputs "Yes" iff your decision diagonal outputs "Raise hand", so if your values U maximize at "Yes" of this function on that data, then you can (will) exert logical control over the value of this fixed mathematical function in which a copy of you is embedded.

That's what life is all about, actually. You could just regard the universe as a big mathematical function containing a copy of you, over which you're exerting logical control.

ETA2: You'd have to ask Gary Drescher whether he knows of anyone else who's reductionist enough to realize that you can control the output of a fixed deterministic mathematical function if that function happens to be one in which you are embedded. As far as I know, it's just Gary Drescher.

ETA3: "Logical control" and "Thou art math" is essentially the same idea as timeless control and thou art physics, it's just even more fun.

Comment author: Vladimir_Nesov 19 August 2009 08:46:00PM *  3 points [-]

Nice. A while ago I also noticed that you can control any mathematical structure if it knows about you and you know about it (i.e. there is logical dependence), which generalizes the notion of trade with other possible worlds, control of the past, etc. If that other mathematical structure is interpreted as an agent, it can be made to behave as you prefer, if in return you behave is it prefers. Thus, it's possible for us to have and realize preferences over mathematical structures, in particular by trading with them in this manner.

At the same time there are all sorts of weird limitations of what's possible to affect this way, for example you can control something faster than light (logical control), but only with info that is already in the logical dependence, which excludes the info that only one side has. For example, if you send away a perfect simulation of your mind on a spaceship, you can "control" what happens of the spaceship if neither of you receives observations from outside, as both computations will be identical. If some info from a year ago is sent to the spaceship, and both you and the simulation observe it (simultaneously), you remain synchronized, but now you learned something new. This way, streams of observations can be sent in both directions, continuously updating both copies. These observations, being identical, are added to logical dependence between you and the simulation, and so can be used in logical control. Thus, the whole state of knowledge in shared, and the conclusions of the whole algorithm of mind can be used for control.

On the other hand, if you know something above and beyond this shared knowledge (like recent observations), you can't use this knowledge or any conclusions reached from this knowledge in logical control. You can't update on non-shared knowledge and retain ability to handle logical dependence. This seems related to non-updating in counterfactual mugging: you need to exercise control over the other possible world, and so you can't update on the observation that is particular to your possible world and use the whole algorithm that includes this update to control the other world. You can "update" if you can factor your state of knowledge into what's dependent to what and what can be used for control of what though.

Eliezer, does the formalism on Pearl's graphs allow to capture this idea? So far, I'm not sure how much insight can be gained from studying it (and your TDT), so I leave it to after I finish learning basics of logic.

Comment author: Eliezer_Yudkowsky 19 August 2009 09:09:30PM 1 point [-]

I think you could use a non-updated Pearl graph for your updateless decision theory, but the part where you (instead of updating) decide which computational processes are similar or dissimilar to you, would be a logical problem, I think, not the domain of causal graphs.

Comment author: Vladimir_Nesov 19 August 2009 09:24:44PM *  1 point [-]

Not-updating is the same kind of simplified denotational behemoth as a GLUT. Much of the usefulness of probabilistic graphical models comes from the fact that they compress the probability distribution into smaller representations and allow manipulation and specification of these distributions in terms of the compact representations. If I just start copying a lot of the graphical models, it won't capture the structure of the problem, so instead of being updateless, the decision theory must update what it can, or represent a lot of partially dependent states of knowledge in a single structure, allowing to extract decisions unaffected by the knowledge that doesn't belong to them.

I suspect that expectation maximization/probability won't play an important role in this structure, as the structure of graphical models seems to capture the same objects as logical dependence must (where do you get the causal graphs from?), and so a structure that can work with logical (in)dependence may already contain the structure captured by probabilistic graphical models, subsuming the latter.

Comment author: Gary_Drescher 20 August 2009 04:25:54PM *  2 points [-]

Just as a matter of terminology, I prefer to say that we can choose (or that we have a choice about) the output, rather than that we control it. To me, control has too strong a connotation of cause.

It's tricky, of course, because the concepts of choice-about and causal-influence-over are so thoroughly conflated that most people will use the same word to refer to both without distinction. So my terminology suggestion is kind of like most materialsts' choice to relinquish the word soul to refer to something extraphysical, retaining consciousness to refer to the actual physical/computational process. (Causes, unlike souls, are real, but still distinct from what they're often conflated with.)

Again, this is just terminology, nothing substantive.

EDIT: In the (usual) special case where a means-end link is causal, I agree with you that we control something that's ultimately mathematical, even in my proposed sense of the term.

Comment author: Eliezer_Yudkowsky 20 August 2009 06:23:25PM 1 point [-]

Hm. To me, "choose" sounds like invoking the idea of multiple possibilities, while "control" sounds more determinism-compatible. Of course that is a mere matter of terminology.

Though I'm not sure what you mean by "in the special case where a means-end link is causal" - my thesis was that if you are uncertain about the output of your decision computation, and you factor the universe the Pearlian way, then your logical decision will end up being, in the graph, the logical cause of box B containing a million dollars. You mean the special case where a means-end link is physical? But what is physics except math? Or are we assuming that the local causal relations in physics are more privileged as ontologically basic causes, whereas "logical causality" is just a convenient way of factoring uncertainty and a winning way to construe counterfactuals? (That last one may have some justice to it.)

Comment author: Gary_Drescher 20 August 2009 09:23:57PM 0 points [-]

I agree that "choose" connotes multiple alternatives, but they're counterfactual antecedents, and when construed as such, are not inconsistent with determinism.

I don't know about being ontologically basic, but (what I think of as) physical/causal laws have the important property that they compactly specify the entirety of space-time (together with a specification of the initial conditions).

Comment author: rwallace 19 August 2009 07:22:15PM 0 points [-]

Is there a formulation of this example that isn't purely metaphysical, i.e. where you could actually detect the difference?

Comment author: ChrisHibbert 19 August 2009 05:52:07AM 3 points [-]

This feels right to me. I can't implement it, and I'm not sure I could explain what Eli said, but I understand Pearl well enough (at an intuitive level) to say that it feels like the kind of additions Eli is talking about would clarify and reach the results he's talking about.

Read Pearl. It's not mathy, it's mostly words about graph manipulation.

If you're bothered by math, read Pearl anyway. He doesn't use equations or make you transform symbols. If you can think about information flows or reason visually, Pearl's calculus is for you. You'll understand what it means for something to be a cause or a possible cause or not a possible cause of something else in a deeper way than you do before Pearl.

If you're already comfortable with math, there's nothing hard about the theory, it's just using a different formalism than linear symbols to explain how events are connected causally.

Thanks Eli.

Comment author: IlyaShpitser 19 August 2009 09:13:04AM 1 point [-]

Second Chris' advice on reading Pearl.

If it helps, I am happy to help with the technical content of the book, or with general technical questions about causal inference (either over email or here).

Comment author: timtyler 20 August 2009 07:13:39AM -1 points [-]

That's "Causality: models, reasoning, and inference By Judea Pearl"...? "Not mathy"? It's jammed full of dense maths! It has integration symbols, summation symbols, logic, probability, theorems and lemmas coming out of its ears! Obviously, Pearl is showing off to impress his peers ;-)

Comment author: ChrisHibbert 20 August 2009 05:44:54PM 2 points [-]

okay, you're right they're in there, but Pearl uses those in the proofs, not the explanations, as I recall. I don't think you have to understand the proofs to get the idea.

If you find math oppressive, let me know if you try Pearl and find it too daunting. If that happens, I'll change the way I describe the book, I promise.

Comment author: Eliezer_Yudkowsky 22 August 2009 09:17:12PM 1 point [-]

Obviously, Pearl is showing off to impress his peers ;-)

Probably a little, but it does help you find mistakes where they exist.

(Okay, that was showing off.)

Comment author: SilasBarta 19 August 2009 07:55:39PM -1 points [-]

I've tried to read Pearl's decision theory book, but it seemed dry and boring. Guess I'll have to give it another go...

It's available online too, but don't pirate it.

Comment author: CarlShulman 19 August 2009 05:47:22AM 3 points [-]

Rolf Nelson wanted to know what everyday problems evidential decision theory produces. Newcomb's Problem can be mapped onto the Prisoner's Dilemma, but are there similarly common Smoking Lesion like problems?

Comment author: Eliezer_Yudkowsky 19 August 2009 03:19:27PM *  2 points [-]

Well, if you're using TDT, then conditioning on the initial state of your physical computation screens off most such problems. But if you don't break down your causal graph that finely, then there are all sorts of situations in which crazy people might be tempted to use EDT. I think Drescher in his book gives the case of someone who observes that people usually decide to cross the street only when it is safe to do so, who concludes that by deciding to cross the street they can make it safe.

Comment author: MichaelVassar 19 August 2009 11:17:44AM 0 points [-]

Majoritarianism may frequently be the result of the application of evidential decision theory, ignoring all of the non-naturalistic vagueness in the formulations of CDT and EDT, might it not?

Comment author: ChrisHibbert 19 August 2009 04:52:27PM 0 points [-]

Some kinds of majoritarianism, certainly. The confusion is based on mistaking correlation of votes with commonality of interests. "If we can all agree to vote for proposition X, then it must be in our favor, right?"

Comment author: cousin_it 19 August 2009 01:55:56AM *  3 points [-]

This is better than nothing, thanks and upvote. Now let's begin translating this stuff. AFAICT, a "decision theory" is supposed to have two parts:

1) A blah blah verbal algorithm for translating real-world problem descriptions into a certain kind of formal structure.

2) A mathematical algorithm that accepts that formal structure and outputs a decision.

I don't fully understand what formal structure you're proposing (a Pearl-style causal graph with additional "logical" arrows? why would this always be acyclic?), and can't understand the algorithm until the structure is clear enough.

Comment author: Eliezer_Yudkowsky 19 August 2009 03:27:36PM *  0 points [-]

why would this always be acyclic?

If the arrows are material implications, then A -> B -> C -> A collapses via iff to a single node. Can you give an example of cyclic logical uncertainty?

Comment author: cousin_it 19 August 2009 04:45:03PM *  0 points [-]

I was thinking of some case where the cycle contains both physical and logical arrows. Logical arrows can point backwards in time, so this doesn't seem to be impossible in principle. Sorry, can't give a specific example because I don't fully understand what you mean by "logical uncertainty".

Comment author: Nick_Tarleton 19 August 2009 05:05:55PM 1 point [-]

My reading is that logical nodes can point to physical nodes, but not vice versa. (Also that it doesn't make sense to say an arrow from a logical node "points backwards in time". Logical nodes are timeless.)

Comment author: Jonathan_Graehl 19 August 2009 04:44:00AM 6 points [-]

One of the benefits of publishing a complete explanation is that some of the (valid) criticisms of it will lead to a stronger, repaired theory.

I confess that I don't follow your program yet, but the outline is much preferred to vague "I have a secret theory" teasing.

Comment author: Eliezer_Yudkowsky 19 August 2009 02:59:35PM *  0 points [-]

Yeah, I hear that claim a lot. It seems to apply to some other world than this one. At some point one must notice when an idealistic belief is failing to accumulate evidence in favor of itself.

We'll see whether publishing this outline yields any criticisms or suggestions over and above what Nesov and Dai already managed to say based on merely "I have a timeless decision theory". I'm not holding my breath. This outline actually is enough that someone versed in Newcomblike problems and causality ought to be able to make out what I'm talking about, and with a bit of intelligence work out on their own just how many classical dilemmas it solves. Nonetheless I fully expect this post to drop into the void and never be heard from again.

That's not because of an evil conspiracy, of course. It's just the default course of events in academia.

Comment author: Jonathan_Graehl 19 August 2009 07:23:50PM 3 points [-]

I feel like the ratio of words written to words read in compsci research is getting pretty awful. Conferences are happy to take whatever paper-like substance you can churn out. It's probably worse in other fields.

Comment author: Wei_Dai 19 August 2009 10:54:03PM *  1 point [-]

Nonetheless I fully expect this post to drop into the void and never be heard from again.

That's not because of an evil conspiracy, of course. It's just the default course of events in academia.

I'd be surprised too if academia were to take a blog post seriously. Why not explain the ideas to someone who has the time and motivation to write them up into academic papers (and share co-authorship or whatever)? If you found the right person, that ought to be much faster than doing it yourself. (I mean take up much less of your own time.)

Comment author: Eliezer_Yudkowsky 19 August 2009 10:57:10PM 1 point [-]

I'd still expect it to drop into the void. Maybe if I write a popular rationality book and it proves popular enough, that probable cost/benefit will change. Are you volunteering?

Comment author: Wei_Dai 19 August 2009 11:05:56PM *  5 points [-]

No, I'm not volunteering. I said earlier that I don't have the skill/experience/patience/willpower for it. You could publicly ask for volunteers though. Perhaps there is a bunch of Ph.D. students around looking for something to write about.

I'd still expect it to drop into the void.

Why is it that Adam Elga can write about the Sleeping Beauty Problem and get 89 citations? Decision theorists are clearly looking something to do...

ETA: Maybe it's because of his reputation/status? In that case I guess you need to convince someone high-status to co-author the papers.

Comment author: Eliezer_Yudkowsky 20 August 2009 07:00:14AM *  0 points [-]

Anyone who declines to talk about interesting material because it's in a blog post, or for that matter, a poem scrawled in blood on toilet paper, is not taking Science seriously. Why should I expect them to have anything important to say if I go to the further trouble of publishing a paper?

I ought to post the decision theory to a thread on /b on 4chan, then try forwarding it around to philosophers who've written on Newcomblike problems. Only the ones who really care about their work would dare to comment on it, and the net quality of discussion would go up. Publishing in a peer-reviewed journal just invites in the riffraff.

Yes, this is somewhat tongue-in-cheek, but not so tongue-in-cheek that I'm not seriously considering trying it.

Comment author: Vladimir_Nesov 20 August 2009 11:41:14AM *  7 points [-]

Ignoring non-papers claiming to have solved a problem is a good crackpot-avoiding heuristic. What isn't even written up is even less likely worth reading than something with only a few citations that is written up.

Comment author: Eliezer_Yudkowsky 22 August 2009 08:53:08PM *  4 points [-]

Ignoring non-papers claiming to have solved a problem is a good crackpot-avoiding heuristic. What isn't even written up is even less likely worth reading than something with only a few citations that is written up.

If that were really what was going on, not status games, then getting a link to the blog post from a couple of known folk of good reputation - e.g. Nick Bostrom and Gary Drescher - would be enough to tell people that here was something worth a quick glance to find out more.

Now it's worth noting that my whole cynicism here can be falsified if this post gets a couple of links from folk of good reputation, followed by genuinely somewhere-leading discussion which solves open problems or points out new genuine problems.

Comment author: DS3618 20 August 2009 05:56:00PM 2 points [-]

"Anyone who declines to talk about interesting material because it's in a blog post, or for that matter, a poem scrawled in blood on toilet paper, is not taking Science seriously. Why should I expect them to have anything important to say if I go to the further trouble of publishing a paper?"

What?

Vladimir is right not paying attention to blog entry with no published work is a great way to avoid crackpots. You have this all backwards you speak as if you have all these credentials so everyone should just take you seriously. In reality what credentials do you have? You built all this expectation for this grand theory and this vague outline is the best you can do? Where is the math? Where is the theory?

I think anyone in academia would be inclined to ask the same question of you why should they take some vague blog entry seriously when the writer controls the comments and can't be bothered to submit his work for peer-review? You talk about wanting to write a PhD thesis this won't help get you there. In fact this vague outline should do nothing but cast doubt in everyones mind as to whether you have a theory or not.

I have been following this TDT issue for a while and I for one would like to see some math and some worked out problems. Otherwise I would be inclined to call your bluff.

Eliezer have you ever published a paper in a peer-review journal? The way you talk about it says naive amateur. There is huge value especially for you since you don't have a PhD or any successful companies or any of the other typical things that people who go the non-academic route tend to have.

Let's face the music here, your one practical AI project that I am aware of Flare failed, and most of your writing has never been subjected to the rigor that all science should be subjected to. It seems to me if you want to do what you claim you need to start publishing.

Comment author: Z_M_Davis 20 August 2009 06:34:20PM *  3 points [-]

[Has Eliezer] ever published a paper in a peer-review journal?

"Levels of Organization in General Intelligence" appeared in the Springer volume Artificial General Intelligence. "Cognitive Biases Potentially Affecting Judgement of Global Risks" (PDF) and "Artificial Intelligence as a Positive and Negative Factor in Global Risk" (PDF) appeared in the Oxford University Press volume Global Catastrophic Risks. They're not mathy papers, though.

Comment author: DS3618 20 August 2009 09:34:16PM 3 points [-]

I am sorry I am going to take a shortcut here and respond to a couple posts along with yours. So fine I partially insert my foot in my mouth... but the issue I think here is that the papers we need to be talking about are math papers right? Anyone can publish non-technical ideas as long as they are well reasoned, but the art of science is the technical mastery.

As for Eliezer's comment concerning the irrelevance of Flare being a pre 2003 EY work I have to disagree. When you have no formal academic credentials and you are trying to make your mark in a technical field such as decision theory anything technical that you have done or attempted counts.

You essentially are building your credentials via work that you have done. I am speaking from experience since I didn't complete college I went the business route. But I can also say that I did a lot of technical work so I built my credentials in the field by doing novel technical things.

I am trying to help here coming from a similar position and wanting a PhD etc. having various technical achievements as my prior work made all the difference in getting in to a PhD program without a B.S. or M.S. It also makes all the difference in being taken seriously by the scientific community.

Which circles back to my original point which is an vague outline is not enough to show you really have a theory much less a revolutionary one. Sadly asking to be taken seriously is just not enough, you have to prove that you meet the bar of admission (decision theory is going to be math).

If someone can show me some technical math work EY has done that would be great, but as of now I have very little confidence that he has a real theory (if someone can I will drop the issue.) Yes I am aware of the Bayesian Theory paper but this lets face it is fairly basic and is far from showing that EY has the ability to revolutionize decision theory.

Comment author: Eliezer_Yudkowsky 20 August 2009 10:17:15PM 1 point [-]

having various technical achievements as my prior work made all the difference in getting in to a PhD program without a B.S. or M.S.

Where? What university?

Comment author: Eliezer_Yudkowsky 20 August 2009 08:17:59PM *  1 point [-]

Also, volume-editing isn't as (pointlessly? signallingly?) difficult as journal peer-review.

Comment author: cousin_it 20 August 2009 06:58:50PM *  0 points [-]

This vague outline is the result of Eliezer yielding to our pleas to say something - anything - about his confident solution to Newcomb's problem. Now that it's been posted as a not-obviously-formalizable text, and people are discussing it informally, I share a lot of your disappointment. But let's give the topic some days and see how it crystallizes.

What's Flare? (...looks it up...) Oh dear Cthulhu, oh no.

(Edit: I originally listed several specific users as "refusing to formalize". That was wrong.)

Comment author: Eliezer_Yudkowsky 20 August 2009 07:45:52PM 0 points [-]

What's Flare?

A legacy of pre-2003 Eliezer, of no particular importance one way or another.

Comment author: Wei_Dai 20 August 2009 07:29:52PM *  0 points [-]

refuse to formalize

What about what I wrote?

These considerations lead to the following design for the decision algorithm S. S is coded with a vector <P1, P2, P3, ...> of programs that it cares about, and a utility function on vectors of the form <E1, E2, E3, …> that defines its preferences on how those programs should run. When it receives an input X, it looks inside the programs P1, P2, P3, ..., and uses its "mathematical intuition" to form a probability distribution P_Y over the set of vectors <E1, E2, E3, …> for each choice of output string Y. Finally, it outputs a string Y* that maximizes the expected utility Sum P_Y(<E1, E2, E3, …>) U(<E1, E2, E3, …>).

Which part do you find insufficiently formal? Of course I use "mathematical intuition" as a black box without explaining how it works, but that's just like EDT using "prior" without explaining where it comes from, or CDT using "causal probability" as a black box. It's an unsolved problem, not refusal to formalize.

Comment author: cousin_it 20 August 2009 07:34:53PM *  0 points [-]

Your decision theory is formal enough for me, but it seems to be different from Eliezer's, which I was talking about. If they're really the same, could you explain how?

Comment author: SilasBarta 20 August 2009 06:55:30PM 1 point [-]

Anyone who declines to talk about interesting material because it's in a blog post, or for that matter, a poem scrawled in blood on toilet paper, is not taking Science seriously.

Heh, if you find a poem scrawled in blood on toilet paper, you probably have a higher priority than Science at the moment -- like finding the psycho f---!

But anyway, you half-jest, but this is a problem I've run into myself. Stephan Kinsella has a widely-cited magnum opus opposing intellectual property rights. I have since presented a gaping hole in its logic, which he acknowledges isn't handled well, but doesn't feel the need to resolve this hole in something he's built his reputation around, merely because I didn't get it published in a journal.

Yes, peer review is good crackpot filter, but it can also be a filter from having to admit your errors. [/threadjack]

Comment author: rwallace 19 August 2009 07:36:15PM 1 point [-]

Looks to me like there's a pretty lively conversation so far!

Comment author: rwallace 19 August 2009 06:05:43PM 2 points [-]

Upvoted; this is a good summary of the issue, and using the new label TDT is arguably more elegant than having to talk separately about the rationality of cultivating a disposition.

How significant are the open questions? We should not expect correct theory to work in the face of arbitrary acts of Omega. Suppose Omega says "Tomorrow I will examine your source code, and if you don't subscribe to TDT I will give you $1 million, and if you do subscribe to TDT I will make you watch the Alien movie series -- from the third one on". In this scenario it would be rational to self modify to something other than TDT; a similar counter can be constructed for any theory whatsoever.

"I just flipped a fair coin. I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000. And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads. The coin came up heads - can I have $1000?"

Does this correspond to a significant class of problem in the real world, in the same way Parfit's Hitchhiker does?

Comment author: Eliezer_Yudkowsky 19 August 2009 08:45:55PM *  2 points [-]

In this scenario it would be rational to self modify to something other than TDT; a similar counter can be constructed for any theory whatsoever.

Right, so the decision theories I try to construct are for classes of problems where I can identify a winning property of how the algorithm decides things or strategizes things or responds to things or whatever, a property which determines the payoff fully and screens off all other dependence on the algorithm. Then the algorithm can maximize that property of itself.

Causal decision theory then corresponds to the problem class where your physical action fully determines the result, and anything else, like logical dependence on your algorithm's disposition, is not allowed. CDT agents will successfully maximize on that problem class.

Comment author: rwallace 19 August 2009 08:57:42PM 1 point [-]

Okay, so what problem class are you aiming for with TDT? It can't be the full class of problems where the result depends on your disposition, because there will always be a counter. Do you have a slightly more restricted class in mind?

Comment author: Eliezer_Yudkowsky 19 August 2009 09:11:54PM *  2 points [-]

The TDT I actually worked out is for the class where your payoffs are fully determined by the actual output of your algorithm, but not by other outputs that your algorithm would have made under other conditions. As I described in the "open problems" post, once you allow this sort of strategy-based dependence, then I can depend on your dependence on my dependence on ... and I don't yet know how to stop the recursion. This is closely related to what Wei Dai and I are talking about in terms of the "logical order" of decisions.

If you want to use the current TDT for the Prisoner's Dilemma, you have to start by proving (or probabilistically expecting) that your opponent's decision is isomorphic to your own. Not by directly simulating the opponent's attempt to determine if you cooperate only if they cooperate. Because, as written, the counterfactual surgery that stops the recursion is just over "What if I cooperate?" not "What if I cooperate only if they cooperate?" (Look at the diagonal sentence.)

Comment author: rwallace 19 August 2009 09:51:18PM 1 point [-]

Okay...

Omega comes along and says "I ran a simulation to see if you would one-box in Newcomb. The answer was yes, so I am now going to feed you to the Ravenous Bugblatter Beast of Traal. Have a nice day."

Doesn't this problem fit within your criteria?

If you reject it on the basis of "if you had told me the relevant facts up front, I would've made the right decision", can't you likewise reject the one where Omega flips a coin before telling you about the proposed bet?

If you have reason in advance to believe that either is likely to occur, you can make an advance decision about what to do.

Does either problem have some particular quality relevant for its classification here, that the other does not?

Comment author: Eliezer_Yudkowsky 19 August 2009 09:57:39PM *  2 points [-]

Omega comes along and says "I ran a simulation to see if you would one-box in Newcomb. The answer was yes, so I am now going to feed you to the Ravenous Bugblatter Beast of Traal. Have a nice day."

That's more like a Counterfactual Mugging, which is the domain of Nesov-Dai updateless decision theory - you're being rewarded or punished based on a decision you would have made in a different state of knowledge, which is not "you" as I'm defining this problem class. (Which again may sound quite restrictive at this point, but if you look at 95% of the published Newcomblike problems...)

What you need here is for the version of you that Omega simulates facing Newcomb's Box, to know about the fact that another Omega is going to reward another version of itself (that it cares about) based on its current logical output. If the simulated decision system doesn't know/believe this, then you really are screwed, but it's more because now Omega really is an unfair bastard (i.e. doing something outside the problem class) because you're being punished based on the output of a decision system that didn't know about the dependency of that event on its output - sort of like Omega, entirely unbeknownst to you, watching you from a rooftop and sniping you if you eat a delicious sandwich.

If the version of you facing Newcomb's Problem has a prior over Omega doing things like this, even if the other you's observed reality seems incompatible with that possible world, then this is the sort of thing handled by updateless decision theory.

Comment author: rwallace 19 August 2009 10:11:48PM 1 point [-]

Right. But then if that is the (reasonable) criterion under which TDT operates, it seems to me that it does indeed handle the case of Omega's after the fact coin flip bet, in the same way that it handles (some versions of) Newcomb's problem. How do you figure that it doesn't?

Comment author: Eliezer_Yudkowsky 19 August 2009 10:25:42PM 0 points [-]

Because the decision diagonal I wrote out, handles the probable consequences of "this computation" doing something, given its current state of knowledge - its current, updated P - so if it already knows the coinflip (especially a logical coinflip like a binary digit of pi) came up heads, and this coinflip has nothing counterfactually to do with its decision, then it won't care about what Omega would have done if the coin had come up tails and the currently executing decision diagonal says "don't pay".

Comment author: rwallace 19 August 2009 10:57:08PM 0 points [-]

Ah! so you're defining "this" as exact bitwise match, I see. Certainly that helps make the conclusions more rigorous. I will suggest the way to handle the after-the-fact coin flip bet is to make the natural extension to sufficiently similar computations.

Note that even selfish agents must do this in order to care about themselves five minutes in the future.

To further motivate the extension, consider the variant of Newcomb where just before making your choice, you are given a piece of paper with a large number written on it; the number has been chosen to be prime or composite depending on whether the money is in the opaque box.

Comment author: Eliezer_Yudkowsky 20 August 2009 05:33:04AM *  0 points [-]

Ah! so you're defining "this" as exact bitwise match

That's not the problem. The problem is that you've already updated your probability distribution, so you just don't care about the cases where the binary digit came up 0 instead of 1 - not because your utility function isn't over them, but because they have negligible probability.

the number has been chosen to be prime or composite depending on whether the money is in the opaque box

(First read that variant in Martin Gardner.) The epistemically intuitive answer is "Once I choose to take one box, I will be able to infer that this number has always been prime". If I wanted to walk through TDT doing this, I'd draw a causal graph with Omega's choice descending from my decision diagonal, and sending a prior-message in turn to the parameters of a child node that runs a primality test over numbers and picked this number because it passed (failed), so that - knowing / having decided your logical choice - seeing this number becomes evidence that its primality test came up positive.

In terms of logical control, you don't control whether the primality test comes up positive on this fixed number, but you do control whether this number got onto the box-label by passing a primality test or a compositeness test.

Comment author: orthonormal 20 August 2009 05:16:18AM 0 points [-]

I will suggest the way to handle the after-the-fact coin flip bet is to make the natural extension to sufficiently similar computations.

It might be nontrivial to do this in a way that doesn't automatically lead to wireheading (using all available power to simulate many extremely fulfilled versions of itself). Or is that problem even more endemic than this?

Comment author: Vladimir_Nesov 19 August 2009 10:05:58PM 0 points [-]

"if you had told me the relevant facts up front, I would've made the right decision"

This is a statement about my global strategy, the strategy I consider winning. In this strategy, I one-box in the states of knowledge where I don't know about the monster, and two-box where I know. If Omega told me about the monster, I'd transition to a state of knowledge where I know about it, and, according to the fixed strategy above, I two-box.

In counterfactual mugging, for each instance of mugging, I give away $100 on the mugging side, and receive $10000 on the reward side. This is also a fixed global strategy that gives the actions depending on agent's state of knowledge.

Comment author: timtyler 19 August 2009 06:24:43PM *  1 point [-]

We already have Disposition-Based Decision Theory - and have had since 2002 or so. I think it's more a case of whether there is anything more to add.

Comment author: rwallace 19 August 2009 06:41:26PM 0 points [-]

Thanks for the link! I'll read the paper more thoroughly later, a quick skim suggests it is along the same lines. Are there any cases where DBDT and TDT give different answers?

Comment author: Gary_Drescher 19 August 2009 07:58:22PM 3 points [-]

I don't think DBDT gives the right answer if the predictor's snapshot of the local universe-state was taken before the agent was born (or before humans evolved, or whatever), because the "critical point", as Fisher defines it, occurs too late. But a one-box chooser can still expect a better outcome.

Comment author: Eliezer_Yudkowsky 19 August 2009 08:57:51PM *  4 points [-]

It looks to me like DBDT is working in the direction of TDT but isn't quite there yet. It looks similar to the sort of reasoning I was talking about earlier, where you try to define a problem class over payoff-determining properties of algorithms.

But this isn't the same as a reflectively consistent decision theory, because you can only maximize on the problem class from outside the system - you presume an existing decision process or ability to maximize, and then maximize the dispositions using that existing decision theory. Why not insert yet another step? What if one were to talk about dispositions to choose particular disposition-choosing algorithms as being rational? In other words, maximizing "dispositions" from outside strikes me as close kin to "precommitment" - it doesn't so much guarantee reflective consistency of viewpoints, as pick one particular viewpoint to have control.

As Drescher points out, if the base theory is a CDT, then there's still a possibility that DBDT will end up two-boxing if Omega takes a snapshot of the (classical) universe a billion years ago before DBDT places the "critical point". A base theory of TDT, of course, would one-box, but then you don't need the edifice of DBDT on top because the edifice doesn't add anything. So you could define "reflective consistency" in terms of "fixed point under precommitment or disposition-choosing steps".

TDT is validated by the sort of reasoning that goes into DBDT, but the TDT algorithm itself is a plain-vanilla non-meta decision theory which chooses well on-the-fly without needing to step back and consider its dispositions, or precommit, etc. The Buck Stops Immediately. This is what I mean by "reflective consistency". (Though I should emphasize that so far this only works on the simple cases that constitute 95% of all published Newcomblike problems, and in complex cases like Wei Dai and I are talking about, I don't know any good fixed algorithm (let alone a single-step non-meta one).)

Comment author: Gary_Drescher 19 August 2009 11:08:11PM *  4 points [-]

Exactly. Unless "cultivating a disposition" amounts to a (subsequent-choice-circumventing) precommitment, you still need a reason, when you make that subsequent choice, to act in accordance with the cultivated disposition. And there's no good explanation for why that reason should care about whether or not you previously cultivated a disposition.

Comment author: Eliezer_Yudkowsky 19 August 2009 11:09:15PM 0 points [-]

(Though I think the paper was trying to use dispositions to define "rationality" more than to implement an agent that would consistently carry out those dispositions?)

Comment author: Gary_Drescher 19 August 2009 11:34:21PM 1 point [-]

I didn't really get the purpose of the paper's analysis of "rationality talk". Ultimately, as I understood the paper, it was making a prescriptive argument about how people (as actually implemented) should behave in the scenarios presented (i.e, the "rational" way for them to behave).

Comment author: timtyler 20 August 2009 06:44:34AM *  0 points [-]

I don't know if Justin Fisher's work exactly replicates your own conclusions. However it seems to have much the same motivations, and to have reached many of the same conclusions.

FWIW, it took me about 15 minutes to find that paper in a literature search.

Another relevant paper:

"No regrets: or: Edith Piaf revamps decision theory".

That one seems to have christened what you tend to refer to as "consistency under reflection" as "desire reflection".

I don't seem to like either term very much - but currently don't have a better alternative to offer.

Comment author: Eliezer_Yudkowsky 20 August 2009 07:14:23AM *  0 points [-]

Violation of desire reflection would be a sufficient condition for violation of dynamic consistency, which in turn is a sufficient condition to violate reflective consistency. I don't see a necessity link.

Comment author: timtyler 20 August 2009 07:03:15AM *  0 points [-]

I had a look a the Wikipedia "Precommitment" article to see whether precommitment is actually as inappropriate as it seems to be being portrayed as.

According to the article, the main issue seems to involve cutting off your own options.

Is a sensible one-boxing agent "precommitting" to one-boxing by "cutting off its own options" - namely the option of two-boxing?

On one hand, they still have the option and a free choice when they come to decide. On the other hand, the choice has been made for them by their own nature - and so they don't really have the option of choosing any more.

My assessment is that the word is not obviously totally inappropriate.

Does "disposition" have the same negative connotations as "precommitting" has? I would say not: "disposition" seems like a fairly appropriate word to me.

Comment author: timtyler 20 August 2009 06:39:09AM -2 points [-]

The most obvious reply to the point about dispositions to have dispositions is to take a behavourist stance: if a disposition results in particular actions under particular circumstances, then a disposition to have a disposition (plus the ability to self-modify) is just another type of disposition, really.

Comment author: timtyler 19 August 2009 06:53:56PM *  -2 points [-]

Well, we have a lengthy description of the revised DBDT - so that should hopefully help figure out what its predicted actions are.

The author claims it gets both the The Smoking-Cancer Problem and Newcomb’s problem right - which seems to be a start.

Comment author: Wei_Dai 19 August 2009 11:09:51AM *  2 points [-]

The three-sentence version is: Factor your uncertainty over (impossible) possible worlds into a causal graph that includes nodes corresponding to the unknown outputs of known computations; condition on the known initial conditions of your decision computation to screen off factors influencing the decision-setup; compute the counterfactuals in your expected utility formula by surgery on the node representing the logical output of that computation.

I'm trying to understand the difference between this formulation and mine. Interestingly, Eliezer seems to have specified a "causal" timeless decision theory, whereas mine could be described as an "evidential" TDT. In my formulation, you'd compute the expected utility of a strategy (i.e., mapping of inputs to outputs) T by taking "S is logically equivalent to T" as a (provisional) axiom, then recomputing logical uncertainties and expected utility.

The "evidential" approach seems simpler. What advantage does the "causal" approach have? Sorry if this is obvious, but my knowledge of Pearl is very limited.

Comment author: Eliezer_Yudkowsky 19 August 2009 03:15:00PM *  1 point [-]

Parfit's Hitchhiker; in the future, after having observed that you've already been picked up and made it to safety, you'll still compute the counterfactual "If the output of my computation were to refuse to pay, then I would not have been picked up."

Since TDT screens off all info that goes into your decision-setup, using your updateless version of TDT might obliterate the difference between evidential and causal approaches entirely - no counterfactuals, no updates, just ruling out of self-copies that have received incompatible sense data. (Not sure yet if this works.)

Comment author: ChrisHibbert 19 August 2009 04:47:26PM *  0 points [-]

CTDT vs. ETDT. Hmm, that's a tough one. First, CTDT allows "screening off" of causes, which makes a big difference.

I liked EY's formulation above: "TDT doesn't cooperate or defect depending directly on your decision, but cooperates or defects depending on how your decision depends on its decision." It's hard to collect evidence, I think, but reasoning about a causal graph gives you the ability to find out how latent decisions affect other outcomes.

So in this case, expected utility based reasoning leaves you in a posiiton where you make some decisions because they seem correlated with good outcomes, while the causal reasoning lets you sometimes see either that the actions and consequences are disconnected or that the causation runs in the opposite direction to what you desire.

ETA: EY's street crossing example is an example of causation running in the opposite direction.

Comment author: Eliezer_Yudkowsky 22 August 2009 09:06:51PM 1 point [-]

= Drescher's street crossing example, don't know if Drescher got it from somewhere else.

Comment author: Psychohistorian 19 August 2009 04:56:15AM 2 points [-]

The three sentence version is actually a one sentence version; it's three independent clauses, but semicolons don't separate sentences.

I'm really sorry, I couldn't help myself.

Comment author: timtyler 19 August 2009 08:38:41AM *  0 points [-]

Re: "Some concluding chiding of those philosophers who blithely decided that the "rational" course of action systematically loses"

Some of those philosophers draw a distinction between rational action and the actions of a rational agent - see here:

I conclude that the rational action for a player in the Newcomb Paradox is taking both boxes, but that rational agents will usually take only one box because they have rationally adopted the disposition to do so.

So: these folk had got the right answer, and any debate with them is over terminology.

Comment author: Eliezer_Yudkowsky 19 August 2009 03:22:16PM 1 point [-]

This is the crippleware version of TDT that pure CDT agents self-modify to. It's crippleware because if you self-modify at 7:00pm you'll two-box against an Omega who saw your code at 6:59am.

Comment author: rwallace 19 August 2009 05:48:37PM *  2 points [-]

By hypothesis, Omega on examining your code at 6:59, knows that you will self-modify at 7:00 and one-box thereafter.

Consider that every TDT agent must be derived from a non-TDT agent. There is no difference in principle between "I used to adhere to CDT but self-modified to TDT" and "I didn't understand TDT when I was a child, but I follow it now as an adult".

Correction made, thanks to Tim Tyler.

Comment author: Eliezer_Yudkowsky 22 August 2009 09:03:18PM 1 point [-]

By hypothesis, Omega on examining your code at 6:59, knows that you will self-modify at 7:00 and one-box thereafter.

CDT agents don't care. They aren't causing Omega to fill box B by changing their source code at 7pm, so they have no reason to change their source code in a way that takes only one box. The source code change only causes Omega to fill box B if Omega looks at their source code after 7pm. That is how CDT agents (unwisely) compute "causes".

Comment author: rwallace 23 August 2009 09:17:11AM 0 points [-]

Yes, but the CDT agent at seven o'clock is not being asked to choose one or two boxes. It has to choose between rewriting its algorithm to plain TDT (or DBDT or some variant that will one box), or to TDT with an exception clause "but use the old algorithm if you find out Omega's prediction was made before seven o'clock". Even by straight CDT, there is no motive for writing that exception.

Comment author: Eliezer_Yudkowsky 23 August 2009 07:34:58PM *  4 points [-]

Even by straight CDT, there is no motive for writing that exception.

This is the point at which I say "Wrong" and "Read the literature". I'm not sure how I can explain this any more clearly than I have already, barring a full-fledged sequence. At 7pm the CDT agent calculates that if it modifies its source to use the old algorithm in cases where Omega saw the code before 7pm, it will get an extra thousand dollars on Newcomb's Problem, since it will take box A which contains an additional thousand dollars, and since its decision to modify its code at 7pm has no effect on an Omega who saw the code before 7pm, hence no effect on whether box B is full. It does not reason "but Omega knows I will change my code". If it reasoned that way it would be TDT, not CDT, and would one-box to begin with.

Comment author: rwallace 23 August 2009 11:06:26PM 0 points [-]

Actually I will add another comment because I can now articulate where the ambiguity comes in: how you add self modification to CDT (which doesn't have it in the usual form); I've been assuming the original algorithm doesn't try to micromanage the new algorithm's decisions (which strikes me as the sensible way, not least because it gives better results here); you've been assuming it does (which I suppose you could argue, is more true to the spirit of the original CDT).

Comment author: rwallace 23 August 2009 10:21:28PM 0 points [-]

I still disagree, but I agree that we have hit the limits of discussion in this comment thread; fundamentally this needs to be analyzed in a more precise language than English. We can revisit it if either of us ever gets to actually programming anything like this.

Comment author: timtyler 20 August 2009 07:07:20AM *  -1 points [-]

By hypothesis, Omega on examining your code at 6:59, knows that you will self-modify at 7:00 and two-box thereafter.

By what hypothesis? That is not how the proposed Disposition-Based Decision Theory says it works. It claims to result in agents who have the disposition to one-box.

Comment author: rwallace 20 August 2009 10:48:48PM 1 point [-]

Sure. This sub thread was about plain CDT, and how it self-modifies into some form of DBDT/TDT once it figures out the benefits of doing so -- and given the hypothesis of an omniscient Omega, then Omega will know that this will occur.

Comment author: timtyler 21 August 2009 05:43:49AM *  -1 points [-]

In that case, what I think you meant to say was:

Omega on examining your code at 6:59, knows that you will self-modify at 7:00 and ONE-box thereafter.

Comment author: rwallace 21 August 2009 07:44:28AM 1 point [-]

Doh! Thanks for the correction, editing comment.

Comment author: timtyler 19 August 2009 05:46:01PM *  0 points [-]

I don't see any reason for thinking this fellow's work represents "crippleware".

It seems to me that he agrees with you regarding actions, but differs about terminology.

Here's the CDT explanation of the terminology:

A way of reconciling the two sides of the debate about Newcomb's problem acknowledges that a rational person should prepare for the problem by cultivating a disposition to one-box. Then whenever the problem arises, the disposition will prompt a prediction of one-boxing and afterwards the act of one-boxing (still freely chosen). Causal decision theory may acknowledge the value of this preparation. It may conclude that cultivating a disposition to one-box is rational although one-boxing itself is irrational. Hence, if in Newcomb's problem an agent two-boxes, causal decision theory may concede that the agent did not rationally prepare for the problem. It nonetheless maintains that two-boxing itself is rational. Although two-boxing is not the act of a maximally rational agent, it is rational given the circumstances of Newcomb's problem.

The basic idea of forming a disposition to one-box has been around for a while. Here's another one:

Prior to entering Newcomb's Problem, it is rational to form the disposition to one-box.

  • Realistic decision theory: rules for nonideal agents ... by Paul Weirich - 2004

...and another one:

"DISPOSITION-BASED DECISION THEORY"

This stronger view employs a disposition-based conception of rationality; it holds that what should be directly assessed for ‘rationality’ is dispositions to choose rather than choices themselves. Intuitively, there is a lot to be said for the disposition to choose one-box in Newcomb’s problem – people who go into Newcomb’s problem with this disposition reliably come out much richer than people who instead go in with the disposition to choose two-boxes. Similarly, the disposition to cooperate in a psychologically-similar prisoners’ dilemma reliably fares much better in this scenario than does the disposition to defect. A disposition-based conception of rationality holds that these intuitive observations about dispositions capture an important insight into the nature of practical rationality.

Comment author: RickJS 22 September 2009 12:28:53AM 0 points [-]

In Eliezer's article on Newcomb's problem, he says, "Omega has been correct on each of 100 observed occasions so far - everyone who took both boxes has found box B empty and received only a thousand dollars; everyone who took only box B has found B containing a million dollars. " Such evidence from previous players fails to appear in some problem descriptions, including Wikipedia's.

For me this is a "no-brainer". Take box B, deposit it, and come back for more. That's what the physical evidence says. Any philosopher who says "Taking BOTH boxes is the rational action," occurs to me as an absolute fool in the face of the evidence. (But I've never understood non-mathematical philosophy anyway, so I may a poor judge.)

Clarifying (NOT rhetorical) questions:

Have I just cheated, so that "it's not the Newcomb Problem anymore?"

When you fellows say a certain decision theory "two-boxes", are those theory-calculations including the previous play evidence or not?

Thanks for your time and attention.

Comment author: JGWeissman 24 September 2009 06:38:38PM 0 points [-]

For me this is a "no-brainer". Take box B, deposit it, and come back for more.

There is no opportunity to come back for more. Assume that when you take box B before taking box A, box A is removed.

Comment author: RickJS 25 September 2009 03:37:46AM 0 points [-]

Yes, I read about " ... disappears in a puff of smoke." I wasn't coming back for a measly $1K, I was coming back for another million! I'll see if they'll let me play again. Omega already KNOWS I'm greedy, this won't come as a shock. He'll probably have told his team what to say when I try it.

" ... and come back for more." was meant to be funny.

Anyway, this still doesn't answer my questions about "Omega has been correct on each of 100 observed occasions so far - everyone who took both boxes has found box B empty and received only a thousand dollars; everyone who took only box B has found B containing a million dollars."

Someone please answer my questions! Thanks!

Comment author: Johnicholas 25 September 2009 11:23:34AM *  0 points [-]

The problem needs lots of little hypotheses about Omega. In general, you can create these hypotheses for yourself, using the principle of "Least Convenient Possible World"

http://lesswrong.com/lw/2k/the_least_convenient_possible_world/

Or, from philosophy/argumentation theory, "Principle of Charity".

http://philosophy.lander.edu/intro/charity.shtml

In your case, I think you need to add at least two helper assumptions - Omega's prediction abilities are trustworthy, and Omega's offer will never be repeated - not for you, not for anyone.

Comment author: eirenicon 22 September 2009 12:57:33AM 0 points [-]

That's what the physical evidence says.

What the physical evidence says is that the boxes are there, the money is there, and Omega is gone. So what does your choice effect and when?

Comment author: RickJS 24 September 2009 05:10:02PM 1 point [-]

Well, I mulled that over for a while, and I can't see any way that contributes to answering my questions.

As to " ... what does your choice effect and when?", I suppose there are common causes starting before Omega loaded the boxes, that affect both Omega's choices and mine. For example, the machinery of my brain. No backwards-in-time is required.

Comment author: timtyler 26 June 2011 06:18:53PM *  -2 points [-]

This is the crippleware version of TDT that pure CDT agents self-modify to. It's crippleware because if you self-modify at 7:00pm you'll two-box against an Omega who saw your code at 6:59am.

Penalising a rational agent for its character flaws while it is under construction seems like a rather weak objection. Most systems have a construction phase during which they may behave imperfectly - so similar objections seem likely to apply to practically any system. However, this is surely no big deal: once a synthetic rational agent exists, we can copy its brain. After that, developmental mistakes would no longer be much of a factor.

It does seem as though this makes CDT essentially correct - in a sense. The main issue would then become one of terminology - of what the word "rational" means. There would be no significant difference over how agents should behave, though.

My reading of this issue is that the case goes against CDT. Its terminology is misleading. I don't think there's much of a case that it is wrong, though.

Comment author: timtyler 19 August 2009 09:03:41AM *  0 points [-]

Eric Barnes - while appreciating the benefits of taking one box - has harsh words for the "taking one box is rational" folk.

I go on to claim that although the ideal strategy is to adopt a necessitating disposition to take only one box, it is never rational to choose only one box. I defend my answer against the alternative analysis of the paradox provided by David Gauthier, and I conclude that his understanding of the orthodox theory of rationality is mistaken.

Comment author: Eliezer_Yudkowsky 22 August 2009 09:14:02PM *  4 points [-]

(Sigh.)

Yes, causal decision theorists have been saying harsh words against the winners on Newcomb's Problem since the dawn of causal decision theory. I am replying to them.

Comment author: timtyler 26 June 2011 07:35:30PM -1 points [-]

Yes, causal decision theorists have been saying harsh words against the winners on Newcomb's Problem since the dawn of causal decision theory. I am replying to them.

Note that this is the same guy who says:

that rational agents will usually take only one box because they have rationally adopted the disposition to do so.

He's drawing a distinction between a "rational action" and the actions of a "rational agent".

Comment author: SilasBarta 22 August 2009 10:35:38PM 2 points [-]

Newcomb's Problem capriciously rewards irrational people in the same way that reality capriciously rewards people who irrationally believe their choices matter.

Comment author: Eliezer_Yudkowsky 22 August 2009 09:12:38PM *  1 point [-]

(Looks over Tim Tyler's general trend in comments.)

Okay. It's helpful that you're doing a literature search. It's not helpful that every time you find something remotely related, you feel a need to claim that it is already TDT and that TDT is nothing innovative by comparison. It does not appear to me that you understand either the general background of these questions as they have been pursued within decision theory, or TDT in particular. Literature search is great, but if you're just spending 15 minutes Googling, then you have insufficient knowledge to compare the theories. Plenty of people have called for a decision theory that one-boxes on Newcomb and smokes on the smoking lesion - the question is coughing up something that seems reasonably formal. Plenty of people have advocated precommitment, but it comes with its own set of problems, and that is why a non-precommitment-based decision theory is important.

Comment author: Cyan 23 August 2009 12:21:40PM *  0 points [-]

In the spirit of dredging up references with no actual deep insight, I note this recent post on Andrew Gelman's blog.

Comment author: timtyler 31 August 2009 12:01:23PM *  -2 points [-]

Well, other people have previously taken a crack at the same problem.

If they have resolved it, then I should think that would be helpful - since then you can look at their solution. If not, their efforts to solve the problem might still be enlightening.

So: I think my contribution in this area is probably helpful.

15 minutes was how long it took me to find the cited material in the first place. Not trivial - but not that hard.

No need to beat me up for not knowing the background of your own largely unpublished theory!

...but yes, in my view, advanced decision theory is a bit of a red herring for those interested in machine intelligence. It's like: that is so not the problem. It seems like wondering whether to use butter-icing or marzipan on the top of the cake - when you don't yet have the recipe or the ingredients.

Comment author: Eliezer_Yudkowsky 31 August 2009 07:15:48PM 1 point [-]

15 minutes was how long it took me to find the cited material in the first place. Not trivial - but not that hard.

The cited material isn't much different from a lot of other material in the same field.

Comment author: timtyler 31 August 2009 07:56:26PM *  -1 points [-]

So far, "Disposition-Based Decision Theory" (and its apparently-flawed precursor) is the only thing I have seen that apparently claims to address and solve the same problem that is under discussion in this forum:

I suppose there's also a raft of CDT enthusiasts, who explain why two-boxing is actually not a flaw in their system, and that they have no objections to the idea of agents who one-box. In their case, the debate appears to be over terminology: what does the word "rational" actually mean - is it about choosing the best action from the available options? Or does it mean something else?

Are there other attempts at a solution? Your turn for some references, I feel.

Comment author: Eliezer_Yudkowsky 31 August 2009 09:34:07PM 1 point [-]

"Paradoxes of Rationality and Cooperation" (the edited volume) will give you a feel for the basics, as will reading Marion Ledwig's thesis paper.

Comment author: Mitchell_Porter 21 August 2009 12:16:38PM 1 point [-]

I'm not keeping up here - I only peek at this site occasionallly, rather than following it - but this:

"The one-sentence version is: Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation."

... seems rather similar to the dictum that you should choose as if you really might be any of your subjective duplicates, from across all possible worlds. (I suppose there is a difference, in that "subjective duplicate" refers only to the properties of yourself that you can perceive, whereas "the abstract computation you implement" refers to a property that is not explicitly available to you.)

And to me that dictum sounds standardly Bayesian - with the set of all entities in all possible worlds providing the prior, and the subjectively available data (about what sort of entity you are) providing the evidence on which you condition. So it's intriguing to see the claim that starting out in this way leads to making the right choices in a number of situations where standard decision theory gets it wrong.

Comment author: Eliezer_Yudkowsky 22 August 2009 09:07:48PM 1 point [-]

Omega may not contain a copy of you which is detailed enough to be a subjective duplicate. Omega may just be reasoning abstractly about you. So you legitimately know that you are not inside Omega - but you also expect that whatever you decide, Omega will have successfully predicted.

Comment author: SforSingularity 19 August 2009 12:25:16PM 1 point [-]

But if you read the other parts of the solution to "free will", and then furthermore explicitly formulate TDT, then this is what utterly, finally, completely, and without even a tiny trace of confusion or dissatisfaction or a sense of lingering questions, kills off entirely the question of "free will".

If this is correct, then it amounts to a profound philosophical and scientific achievement.

Comment author: Eliezer_Yudkowsky 19 August 2009 03:24:46PM *  7 points [-]

Not by my standards.

Free will is about as easy as a problem can get and still be Confusing. Plenty of moderately good reductionists have refused to be confused by it. Killing off the problem entirely is more like dropping nuclear weapons to obliterate the last remnants of a dead horse than any great innovation within the field of reductionism.

There are non-reductionist philosophers who would think of reducing free will as a great and difficult achievement, but by reductionist standards it's a mostly-solved problem already.

Formal cooperation in the one-shot PD, now that should be interesting.

Comment author: PhilGoetz 21 August 2009 05:57:21PM *  0 points [-]

Here is what I don't understand about the free will problem. I know this is a simple objection, so there must be a standard reply to it; but I don't know what that reply is.

Denote F as a world in which free will exists, f as one in which it doesn't. Denote B as a world in which you believe in free will, and b as one in which you don't. Let a combination of the two, e.g., FB, denote the utility you derive from having that belief in that world. Suppose FB > Fb and fb > fB (being correct > being wrong).

The expected utility of B is FB x p(F) + fB x (1-p(F)). Expected utility of b is Fb x p(F) + fb x (1-p(F)). Choose b if Fb x p(F) + fb x (1-p(F)) > FB x p(F) + fB x (1-p(F)).

But, that's not right in this case! You shouldn't consider worlds of type f in your decision, because if you're in one of those worlds, your decision is pre-ordained. It doesn't make any sense to "choose" not to believe in free will - that belief may be correct, but if it is correct, then you can't choose it.

Over worlds of type F, the expected utility of B is FB x p(F), and the utility of b is Fb x p(F), and FB > Fb. So you always choose B.

Comment author: Eliezer_Yudkowsky 21 August 2009 07:48:50PM 6 points [-]

Denote F as a world in which free will exists, f as one in which it doesn't.

I am unable to attach a truth condition to these sentences - I can't imagine two different ways that reality could be which would make the statements true or alternatively false.

You shouldn't consider worlds of type f in your decision, because if you're in one of those worlds, your decision is pre-ordained.

http://wiki.lesswrong.com/wiki/Free_will_(solution)

Comment author: PhilGoetz 21 August 2009 08:04:43PM 0 points [-]

I can't imagine two different ways that reality could be which would make the statements true or alternatively false.

Do you mean that the phrases "free will exists" and "free will does not exist" are both incoherent?

Comment author: Eliezer_Yudkowsky 21 August 2009 08:13:10PM 4 points [-]

If I want to, I can assign a meaning to "free will" in which it is tautologically true of causal universes as such, and applied to agents, is true of some agents but not others. But you used the term, you tell me what it means to you.

Comment author: PhilGoetz 21 August 2009 08:37:12PM *  -1 points [-]

You used the term first. You called it a "dead horse" and "about as easy as a problem can get and still be Confusing". I would think this meant that you have a clear concept of what it means. And it can't be a tautology, because tautologies are not dead horses.

I can at least say that, to me, "Free will exists" implies "No Omega can predict with certainty whether I will one-box or two-box." (This is not an "if and only if" because I don't want to say that a random process has free will; nor that an undecidable algorithm has free will.)

I thought about saying: "Free will does not exist" if and only if "Consciousness is epiphenomenal". That sounds dangerously tautological, but closer to what I mean.

I can't think how to say anything more descriptive than what I wrote in my first comment above. I understand that saying there is free will seems to imply that I am not an algorithm; and that that seems to require some weird spiritualism or vitalism. But that is vague and fuzzy to me; whereas it is clear that it doesn't make sense to worry about what I should do in the worlds where I can't actually choose what I will do. I choose to live with the vague paradox rather than the clear-cut one.

ADDED: I should clarify that I don't believe in free will. I believe there is no such thing. But, when choosing how to act, I don't consider that possibility, because of the reasons I gave previously.

Comment author: Eliezer_Yudkowsky 21 August 2009 10:55:14PM 5 points [-]

I can at least say that, to me, "Free will exists" implies "No Omega can predict with certainty whether I will one-box or two-box."

Then you've got the naive incoherent version of "free will" stuck in your head. Read the links.

http://wiki.lesswrong.com/wiki/Free_will

http://wiki.lesswrong.com/wiki/Freewill(solution)

Comment author: PhilGoetz 22 August 2009 09:23:34PM *  2 points [-]

All right, I read all of the non-italicized links, except for the "All posts on Less Wrong tagged Free Will", trusting that one of them would say something relevant to what I've said here. But alas, no.

All of those links are attempts to argue about the truth value of "there is free will", or about whether the concept of free will is coherent, or about what sort of mental models might cause someone to believe in free will.

None of those things are at issue here. What I am talking about is what happens when you are trying to compute something over different possible worlds, where what your computation actually does is different in these different worlds. When you must compare expected value in possible worlds in which there is no free will, to expected value in possible worlds in which there is free will, and then make a choice; what that choice actually does is not independent of what possible world you end up in. This means that you can't apply expectation-maximization in the usual way. The counterintuitive result, I think, is that you should act in the way that maximizes expected value given that there is free will, regardless of the computed expected value given that there is not free will.

As I mentioned, I don't believe in free will. But I think, based on a history of other concepts or frameworks that seemed paradoxical but were eventually worked out satisfactorily, that it's possible there's something to the naive notion of "free will".

We have a naive notion of "free will" which, so far, no one has been able to connect up with our understanding of physics in a coherent way. This is powerful evidence that it doesn't exist, or isn't even a meaningful concept. It isn't proof, however; I could say the same thing about "consciousness", which as far as I can see really shouldn't exist.

All attempts that I've seen so far to parse out what free will means, including Eliezer's careful and well-written essays linked to above, fail to noticeably reduce the probability I assign to there being naive "free will", because the probability that there is some error in the description or mapping or analogies made is always much higher than the very-low prior probability that I assign to there being "free will".

I'm not arguing in favor of free will. I'm arguing that, when considering an action to take that is conditioned on the existence of free will, you should not do the usual expected-utility calculations, because the answer to the free will question determines what it is you're actually doing when you choose an action to take, in a way that has an asymmetry such that, if there is any possibility epsilon > 0 that free will exists, you should assume it exists.

(BTW, I think a philosopher who wished to defend free will could rightfully make the blanket assertion against all of Eliezer's posts that they assume what they are trying to prove. It's pointless to start from the position that you are an algorithm in a Blocks World, and argue from there against free will. There's some good stuff in there, but it's not going to convince someone who isn't already reductionist or determinist.)

Comment author: Eliezer_Yudkowsky 22 August 2009 11:25:06PM *  7 points [-]

When you must compare expected value in possible worlds in which there is no free will, to expected value in possible worlds in which there is free will

I have stated exactly what I mean by the term "free will" and it makes this sentence nonsense; there is no world in which you do not have free will. And I see no way that your will could possibly be any freer than it already is. There is no possible amendment to reality which you can consistently describe, that would make your free will any freer than it is in our own timeless and deterministic (though branching) universe.

What do you mean by "free will" that makes your sentence non-nonsense? Don't say "if we did actually have free will", tell me how reality could be different.

Comment author: Alicorn 21 August 2009 06:05:56PM *  6 points [-]

You shouldn't consider worlds of type f in your decision, because if you're in one of those worlds, your decision is pre-ordained. It doesn't make any sense to "choose" not to believe in free will - that belief may be correct, but if it is correct, then you can't choose it.

Saying that you shouldn't do something because it's preordained whether you do it or not is a very confused way of looking at things. Christine Korsgaard, by whom I am normally unimpressed but who has a few quotables, says:

Having discovered that my conduct is predictable, will I now sit quietly in my chair, waiting to see what I will do? Then I will not do anything but sit quietly in my chair. And that had better be what you predicted, or you will have been wrong. But in any case why should I do that, if I think I ought to be working?

(From "The Authority of Reflection")

Comment author: PhilGoetz 21 August 2009 06:15:36PM *  -1 points [-]

I don't understand what that Korsgaard quote is trying to say.

Saying that you shouldn't do something because it's preordained whether you do it or not is a very confused way of looking at things.

I didn't say that. I said that, when making a choice, you shouldn't consider, in your set of possible worlds, possible worlds in which you can't make that choice.

It's certainly not as confused a way of looking at things as choosing to believe that you can't choose what to believe.

I should have said you shouldn't try to consider those worlds. If you are in f, then it may be that you will consider such possible worlds; and there's no shouldness about it.

"But", you might object, "what should you do if you are a computer program, running in a deterministic language on deterministic hardware?"

The answer is that in that case, you do what you will do. You might adopt the view that you have no free will, and you might be right.

The 2-sentence version of what I'm saying is that, if you don't believe in free will, you might be making an error that you could have avoided. But if you believe in free will, you can't be making an error that you could have avoided.

Comment author: Alicorn 21 August 2009 06:23:33PM 4 points [-]

I don't understand what that Korsgaard quote is trying to say.

In the context of the larger paper, the most charitable way of interpreting her (IMO) is that whether we have free will or not, we have the subjective impression of it, this impression is simply not going anywhere, and so it makes no sense to try to figure out how a lack of free will ought to influence our behavior, because then we'll just sit around waiting for our lack of free will to pick us up out of our chair and make us water our houseplants and that's not going to happen.

I said that, when making a choice, you shouldn't consider, in your set of possible worlds, possible worlds in which you can't make that choice.

What if we're in a possible world where we can't choose not to consider those worlds? ;)

It's certainly not as confused a way of looking at things as choosing to believe that you can't choose what to believe.

"Choosing to believe that you can't choose what to believe" is not a way of looking at things; it's a possible state of affairs, in which one has a somewhat self-undermining and false belief. Now, believing that one can choose to believe that one cannot choose what to believe is a way of looking at things, and might even be true. There is some evidence that people can choose to believe self-undermining false things, so believing that one could choose to believe a particular self-undermining false thing which happens to have recursive bearing on the choice to believe it isn't so far out.

Comment author: saturn 23 August 2009 06:27:29AM *  1 point [-]

The mistake you're making is that determinism does not mean your decisions are irrelevant. The universe doesn't swoop in and force you to decide a certain way even though you'd rather not. Determinism only means that your decisions, by being part of physical reality rather than existing outside it, result from the physical events that led to them. You aren't free to make events happen without a cause, but you can still look at evidence and come to correct conclusions.

Comment author: brian_jaress 21 August 2009 06:33:10PM *  0 points [-]

If you can't choose whether you believe, then you don't choose whether you believe. You just believe or not. The full equation still captures the correctness of your belief, however you arrived at it. There's nothing inconsistent about thinking that you are forced to not believe and that seeing the equation is (part of) what forced you.

(I avoid the phrase "free will" because there are so many different definitions. You seem to be using one that involves choice, while Eliezer uses one based on control. As I understand it, the two of you would disagree about whether a TV remote in a deterministic universe has free will.)

edit: missing word, extra word

Comment author: PhilGoetz 21 August 2009 07:57:09PM *  -1 points [-]

Brian said:

If you can't choose whether you believe, then you don't choose whether you believe. You just believe or not. The full equation still captures the correctness of your belief, however you arrived at it. There's nothing inconsistent about thinking that you are forced to not believe and that seeing the equation is (part of) what forced you.

And Alicorn said:

What if we're in a possible world where we can't choose not to consider those worlds? ;)

And before either of those, I said:

"But", you might object, "what should you do if you are a computer program, running in a deterministic language on deterministic hardware?"

The answer is that in that case, you do what you will do. You might adopt the view that you have no free will, and you might be right.

These all seem to mean the same thing. When you try to argue against what someone said by agreeing with him, someone is failing to communicate.

Brian, my objection is not based on the case fb. It's based on the cases Fb and fB. fB is a mistake that you had to make. Fb, "choosing to believe that you can't choose to believe", is a mistake you didn't have to make.

Comment author: brian_jaress 21 August 2009 11:26:29PM 0 points [-]

Yes. I started writing my reply before Alicorn said anything, took a short break, posted it, and was a bit surprised to see a whole discussion had happened under my nose.

But I don't see how what you originally said is the same as what you ended up saying.

At first, you said not to consider f because there's no point. My response was that the equation correctly includes f regardless of your ability to choose based on the solution.

Now you are saying that Fb is different from (inferior to?) fB.

Comment author: SforSingularity 19 August 2009 06:54:51PM *  0 points [-]

Free will is counted as one of the great problems of philosophy. Wikipedia Lists it as a "central problem of metaphysics". SEP has a whole, long article on it along with others on: "compatibilism", "causal determinism" , "free will and fatalism", "divine foreknowledge", "incompatibilism (nondeterministic) theories of free will" and "arguments for incompatibilism".

If you really have "nuked the dead donkey" here, you would cut out a lot of literature. Furthermore, religious people would no longer be able to use "free will" as a magic incantation with which to defend God.

Comment author: Eliezer_Yudkowsky 19 August 2009 07:40:30PM 3 points [-]

Dennett and others have used multi-ton high explosives on the dead donkey. Why would nuclear weapons make a further difference?

Comment author: SforSingularity 19 August 2009 09:32:45PM 0 points [-]

People respond to math more than to words.

Comment author: Eliezer_Yudkowsky 19 August 2009 09:43:16PM *  4 points [-]

Er... no they don't?

Comment author: Vladimir_Nesov 19 August 2009 09:47:53PM 2 points [-]

Some do.

Comment author: SforSingularity 19 August 2009 10:12:46PM *  1 point [-]

rather, if one challenges a valid verbal theory one can usually find some way of convincing people that there is some "wiggle room", that it may or may not be valid, etc. But a mathematical theory has, I think, an air of respectability that will make people pay attention, even if they don't like it, and especially if they don't actually understand the mathematics.

If your theory finds applications, (which, given the robotics revolution we seem to be in the middle of is not vastly unlikely), then it will further marginalize those who stick to the old convenient confusion about free will.

Of course, given what has happened with evolution (smart Christians accept it, but find excuses to still believe in God), I suspect that it will only have an incremental impact on religiosity, even amongst the elite.

Comment author: rwallace 19 August 2009 07:40:32PM 2 points [-]

The only reason free will is regarded as a problem of philosophy is that philosophers are in the rather bizarre habit of defining it as "your actions are uncaused" - it should be no surprise that a nonsensical definition leads to problems!

When we use the correct definition - the one that corresponds to how the term is actually used - "your actions are caused by your own decisions, as opposed to by external coercion" - the problem doesn't arise.

Comment author: timtyler 19 August 2009 07:56:19PM -2 points [-]

Free will seems like a pretty boring topic to me. The main recent activity I have noticed in the area was Daniel Dennett's "Freedom Evolves" book. That book was pretty boring and mostly wrong - I thought. It was curious to see Daniel Dennett make such a mess of the subject, though.

Comment author: gwern 20 August 2009 08:06:42AM 1 point [-]

As it happens, I'm reading through Freedom Evolves right now; up to chapter 3, and while I don't quite buy his ideas on inevitability, it so far doesn't strike me as a mess?

Comment author: timtyler 20 August 2009 08:45:26AM -2 points [-]

I liked the bit on memes. Most of the rest of it was a lot of word games, IMO.

Comment author: Wei_Dai 23 August 2009 10:12:53AM 0 points [-]

I gave one example earlier of TDT agents not playing cooperate in PD against each other. Here's another, perhaps even more puzzling, example.

Consider 3 TDT agents, A, B, and C, playing a game of 3-choose-2 PD. These agents are identical, except that they have different beliefs about how they are logically related to each other. A and B both believe that A and B are 100% logically correlated (in other words, logically equivalent). A and C both believe that A and C are 0% logically correlated. B and C also believe that B and C are 0% logically correlated.

What's the outcome of this game? Well, C should clearly play defect, since it's sure that it's not correlated with either of the other players. A and B both play cooperate, since that maximizes expected utility given that they are correlated with each other but not with C (the arithmetic is the same as in my earlier 3-choose-2 PD example). Given this outcome, their initial beliefs about their logical relationships don't seem to be inconsistent.

Comment author: Eliezer_Yudkowsky 23 August 2009 07:42:35PM 1 point [-]

How do they end up in this situation? Clearly they cannot all have common knowledge of each other's source code, so where do they obtain their definite beliefs about each other instead?

Comment author: Wei_Dai 23 August 2009 08:44:03PM 0 points [-]

Re: "definite beliefs", the numbers don't have to be 100% and 0%. They could be any p and q, where p is above the threshold for cooperation, and q is below.

As for where the numbers come from, I don't know. Perhaps the players have different initial intuitions (from a mathematical intuition module provided by evolution or their programmers) about their logical correlations, which causes them to actually have different logical correlations (since they are actually computing different things when making decisions), which then makes those intuitions consistent upon reflection.

Comment author: Eliezer_Yudkowsky 23 August 2009 08:49:22PM 1 point [-]

Why can't A and B choose to be correlated with C by deliberately making their decision dependent on its decision? Insufficient knowledge of C's code even to make their decision dependent on "what an agent does when it thinks it's not correlated to you"? In other words, you know that C is going to follow a certain decision algorithm here - do the Dai-obvious thing and defect - but A and B don't know enough about C to defect conditional on the "obvious" thing being to defect?

Comment author: Wei_Dai 23 August 2009 10:22:38PM 1 point [-]

Why can't A and B choose to be correlated with C by deliberately making their decision dependent on its decision?

A and B don't choose this, because given their beliefs (i.e., low correlation between A and C, and B and C), that doesn't maximize their expected utilities. So the belief is like a self-fulling prophecy. Intuitively, you might think "Why don't they get out of this trap by choosing to be correlated with C and simultaneously change their beliefs?" The problem is that they don't think this will work, because they think C wouldn't respond to this.

In other words, why would A and B defect conditional on C defecting, when they know "C is going to follow a certain decision algorithm here - do the Dai-obvious thing and defect"?

Anyway, that's what I think happens under UDT1. It's quite possible (almost certain, really) that UDT1 is wrong or incomplete. But if you have a better solution, can you try to formalize it, and not just make informal arguments? Or, if you think you have an intuitively satisfactory solution that you don't know how to formalize yet, I'll stop beating this dead horse and let you work it out.

Comment author: Eliezer_Yudkowsky 23 August 2009 11:50:08PM 1 point [-]

I don't have a general solution. I'm just carrying out the reasoning by hand. I don't know how to solve the logical ordering problem.

In other words, why would A and B defect conditional on C defecting, when they know "C is going to follow a certain decision algorithm here - do the Dai-obvious thing and defect"?

Why would C choose to follow such an algorithm, if C perceives that not following such an algorithm might lead to mutual cooperation instead of mutual defection?

Essentially, I'm claiming that your belief about "logical uncorrelation" is hard to match up with your out-of-context intuitive reasoning about what all the parties are likely to do. It's another matter if C is a piece of cardboard, a random number generator, or a biological organism operating on some weird deluded decision theory; but you're reasoning as if C is calmly maximizing.

Suppose I put things to you this way: Groups of superrational agents will not occupy anything that is not at least a Pareto optimum, because they have strong motives to occupy Pareto optima and TDT lets them coordinate where such motives exist. Now the 3-choose-2 problem with two C players and one D player may be a Pareto optimum (if taken at face value without further trades being possible), but if you think of Pareto-ization as an underlying motivation - that everyone starts out in the mutual defection state, and then has a motive to figure out how to leave the mutual defection state by increasing their entanglement - then you might see why I'm a bit more skeptical about these "logical uncorrelations". Then you just end up in the all-D state, the base state, and agents have strong incentives to figure out ways to leave it if they can.

In other words, you seem to be thinking in terms of a C-equilibrium already accomplished among one group of agents locally correlated with themselves only, and looking at the incentive of other agents to locally-D; whereas my own reasoning assumes the D-equilibrium already globally accomplished, but suspects that in this case rational agents have a strong incentive to reach up to the largest reachable C-equilibria, which they can accomplish by increasing (not decreasing) various forms of entanglement.

Relations between "previously uncorrelated" groups may be viewable as analogous to relations between causally uncorrelated individuals. To assume that one subgroup has decided on interior cooperation even though it makes them vulnerable to outside defection, without that subgroup having demanded anything in return, may be like presuming unilateral cooperation on the PD.

Comment author: Wei_Dai 26 August 2009 03:32:43AM 0 points [-]

whereas my own reasoning assumes the D-equilibrium already globally accomplished, but suspects that in this case rational agents have a strong incentive to reach up to the largest reachable C-equilibria, which they can accomplish by increasing (not decreasing) various forms of entanglement.

Ok, this looks reasonable to me. But how would they actually go about doing this? So far I can see two general methods:

  1. convergence towards an "obvious" decision theory
  2. deliberate conditioning of moves between players

My current view is that neither of these methods seem very powerful as mechanisms for enabling cooperation, compared to say the ability to prove source code, or to merge securely. To summarize my thoughts and the various examples I've given, here are the problems with each of the above methods for "increasing entanglement":

  1. Two agents with the same "obvious" decision theory may not be highly correlated, if they have different heuristics, intuitions, priors, utility functions, etc. Also, an agent may have a disincentive to unilaterally increase his correlation with a large group of already highly correlated agents.
  2. Deliberate conditioning of moves is difficult when two sides have high uncertainty about each others' source code. Which hypothetical agent(s) do you condition your move against? How would they know that you've done so, when they don't know your source code either? It's also difficult if two sides have different preferences about the correlation of their moves, that is, if one side wants them to be positively correlated, and another wants them to be uncorrelated or negatively correlated.
Comment author: Eliezer_Yudkowsky 26 August 2009 05:56:26AM 1 point [-]

These sound like basically reasonable worries / lines of argument to me. I'm sure life will be a lot easier for... not necessarily everyone, but at least us primitive mortal analysts... if it's easy for superintelligences to exhibit their source code to each other. Then we just have the problem of logical ordering in threats and games of Chicken. (Come to think of it, blackmail threats of mutual destruction unless paid off, would seem to become more probable, not less, as you became more able to exhibit and prove your source code to the other player.)

A possible primary remaining source of our differing guesses at this point, may have to do with the degree to which we think that decision processes are a priori (un)correlated. I take statements like "Obviously, everyone plays D at the end" to be evidence of very high a priori correlation - it's no good talking about different heuristics, intuitions, priors, utility functions, etcetera, if you don't actually conclude that maybe some players play C and others play D.

It's also difficult if two sides have different preferences about the correlation of their moves, that is, if one side wants them to be positively correlated, and another wants them to be uncorrelated or negatively correlated.

How would that happen?

Comment author: Wei_Dai 26 August 2009 06:46:01AM *  0 points [-]

(Come to think of it, blackmail threats of mutual destruction unless paid off, would seem to become more probable, not less, as you became more able to exhibit and prove your source code to the other player.)

I think Nesov's position is that such threats don't work against updateless agents, but I'm not sure about that yet. ETA: See previous discussion of this topic.

I take statements like "Obviously, everyone plays D at the end" to be evidence of very high a priori correlation - it's no good talking about different heuristics, intuitions, priors, utility functions, etcetera, if you don't actually conclude that maybe some players play C and others play D.

That doesn't make sense... Suppose nobody smokes, and nobody gets cancer. Does that mean smoking and cancer are correlated? In order to have correlation, you need to have both (C,C) and (D,D) outcomes. If all you have are (D,D) outcomes, there is no correlation.

How would that happen?

I'm referring to rock-paper-scissors and this example. Or were you asking something else?

Comment author: CronoDAS 19 August 2009 03:23:39PM 0 points [-]

This is indeed interesting, although it seems to be going over my head somewhat.

Comment author: thomblake 19 August 2009 03:10:46PM *  0 points [-]

In conclusion, rational agents are not incapable of cooperation, rational agents are not constantly fighting their own source code, rational agents do not go around helplessly wishing they were less rational, and finally, rational agents win.

I'm pretty sure Socrates and Aristotle already pointed much of this out in different words. I should make a post about that. Of course, they didn't do the math.

I agree with cousin_it below. It seems like you're missing some math.

But other than that, I don't see what the big deal is. I was expecting something monumental and game-changing, not "Is that it?"