Eliezer_Yudkowsky comments on Why do theists, undergrads, and Less Wrongers favor one-boxing on Newcomb? - Less Wrong

15 Post author: CarlShulman 19 June 2013 01:55AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (299)

You are viewing a single comment's thread. Show more comments above.

Comment author: Qiaochu_Yuan 19 June 2013 04:31:57AM *  18 points [-]

I've been reading a little of the philosophical literature on decision theory lately, and at least some two-boxers have an intuition that I hadn't thought about before that Newcomb's problem is "unfair." That is, for a wide range of pairs of decision theories X and Y, you could imagine a problem which essentially takes the form "Omega punishes agents who use decision theory X and rewards agents who use decision theory Y," and this is not a "fair" test of the relative merits of the two decision theories.

The idea that rationalists should win, in this context, has a specific name: it's called the Why Ain'cha Rich defense, and I think what I've said above is the intuition powering counterarguments to it.

I'm a little more sympathetic to this objection than I was before delving into the literature. A complete counterargument to it should at least attempt to define what fair means and argue that Newcomb is in fact a fair problem. (This seems related to the issue of defining what a fair opponent is in modal combat.)

Comment author: Eliezer_Yudkowsky 19 June 2013 04:28:33PM 30 points [-]

TDT's reply to this is a bit more specific.

Informally: Since Omega represents a setup which rewards agents who make a certain decision X, and reality doesn't care why or by what exact algorithm you arrive at X so long as you arrive at X, the problem is fair. Unfair would be "We'll examine your source code and punish you iff you're a CDT agent, but we won't punish another agent who two-boxes as the output of a different algorithm even though your two algorithms had the same output." The problem should not care whether you arrive at your decisions by maximizing expected utility or by picking the first option in English alphabetical order, so long as you arrive at the same decision either way.

More formally: TDT corresponds to maximizing on the class of problems whose payoff is determined by 'the sort of decision you make in the world that you actually encounter, having the algorithm that you do'. CDT corresponds to maximizing over a fair problem class consisting of scenarios whose payoff is determined only by your physical act, and would be a good strategy in the real world if no other agent ever had an algorithm similar to yours (you must be the only CDT-agent in the universe, so that your algorithm only acts at one physical point) and where no other agent could gain any info about your algorithm except by observing your controllable physical acts (tallness being correlated with intelligence is not allowed). UDT allows for maximizing over classes of scenarios where your payoff can depend on actions you would have taken in universes you could have encountered but didn't, i.e., the Counterfactual Mugging. (Parfit's Hitchhiker is outside TDT's problem class, and in UDT, because the car-driver asks "What will this hitchhiker do if I take them to town? so that a dishonorable hitchhiker who is left in the desert is getting a payoff which depends on what they would have done in a situation they did not actually encounter. Likewise the transparent Newcomb's Box. We can clearly see how to maximize on the problem but it's in UDT's class of 'fair' scenarios, not TDT's class.)

If the scenario handed to the TDT algorithm is that only one copy of your algorithm exists within the scenario, acting at one physical point, and no other agent in the scenario has any knowledge of your algorithm apart from acts you can maximize over, then TDT reduces to CDT and outputs the same action as CDT, which is implied by CDT maximizing over its problem class and TDT's class of 'fair' problems strictly including all CDT-fair problems.

If Omega rewards having particular algorithms independently of their outputs, by examining the source code without running it, the only way to maximize is to have the most rewarded algorithm regardless of its output. But this is uninteresting.

If a setup rewards some algorithms more than others because of their different outputs, this is just life. You might as well claim that a cliff punishes people who rationally choose to jump off it.

This situation is interestingly blurred in modal combat where an algorithm may perhaps do better than another because its properties were more transparent (more provable) to another algorithm examining it. Of this I can only say that if, in real life, we end up with AIs examining each other's source code and trying to prove things about each other, calling this 'unfair' is uninteresting. Reality is always the most important domain to maximize over.

Comment author: [deleted] 19 June 2013 06:11:02PM *  7 points [-]

I'd just like to say that this comparison of CDT, TDT, and UDT was a very good explanation of the differences. Thanks for that.

Comment author: ESRogs 19 June 2013 10:12:21PM 4 points [-]

Agreed. Found the distinction between TDT and UDT especially clear here.

Comment author: ArisKatsaris 20 June 2013 01:21:07AM *  4 points [-]

This explanation makes UDT seem strictly more powerful than TDT (if UDT can handle Parfit's Hitchhiker and TDT can't).

If that's the case, then is there a point in still focusing on developing TDT? Is it meant as just a stepping stone to an even better decision theory (possibly UDT itself) down the line? Or do you believe UDT's advantages to be counterbalanced by disadvantages?

Comment author: Eliezer_Yudkowsky 20 June 2013 01:23:29AM 7 points [-]

UDT doesn't handle non-base-level maximization vantage points (previously "epistemic vantage points") for blackmail - you can blackmail a UDT agent because it assumes your strategy is fixed, and doesn't realize you're only blackmailing it because you're simulating it being blackmailable. As currently formulated UDT is also non-naturalistic and assumes the universe is divided into a not-you environment and a UDT algorithm in a Cartesian bubble, which is something TDT is supposed to be better at (though we don't actually have good fill-in for the general-logical-consequence algorithm TDT is supposed to call).

I expect the ultimate theory to look more like "TDT modded to handle UDT's class of problems and blackmail and anything else we end up throwing at it" than "UDT modded to be naturalistic and etc", but I could be wrong - others have different intuitions about this.

Comment author: Wei_Dai 20 June 2013 06:09:10AM 7 points [-]

As currently formulated UDT is also non-naturalistic and assumes the universe is divided into a not-you environment and a UDT algorithm in a Cartesian bubble, which is something TDT is supposed to be better at (though we don't actually have good fill-in for the general-logical-consequence algorithm TDT is supposed to call).

UDT was designed to move away from the kind of Cartesian dualism as represented in AIXI. I don't understand where it's assuming its own Cartesian bubble. Can you explain?

Comment author: Eliezer_Yudkowsky 20 June 2013 04:04:00PM 0 points [-]

The version I saw involved a Universe computation which accepts an Agent function and then computes itself, with the Agent makings it choices based on its belief about the Universe? That seemed to me like a pretty clean split.

Comment author: cousin_it 21 June 2013 08:47:08AM *  5 points [-]

No, the version we've been discussing for the last several years involves an argumentless Universe function that contains the argumentless Agent function as a part. Agent knows the source code of Agent (via quining) and the source code of Universe, but does not apriori know which part of the Universe is the Agent. The code of Universe might be mixed up so it's hard to pick out copies of Agent. Then Agent tries to prove logical statements of the form "if Agent returns a certain value, then Universe returns a certain value". As you can see, that automatically takes into account the logical correlates of Agent as well.

Comment author: ArisKatsaris 21 June 2013 10:39:55AM *  6 points [-]

I find it rather disappointing that the UDT people and the TDT people have seemingly not been communicating very efficiently with each other in the last few years...

Comment author: Wei_Dai 21 June 2013 04:30:25PM 4 points [-]

I think what has happened is that most of the LW people working on decision theory in the past few years have been working with different variations on UDT, while Eliezer hasn't participated much in the discussions due to being preoccupied with other projects. It seems understandable that he saw some ideas that somebody was playing with, and thought that everyone was assuming something similar.

Comment author: lukeprog 25 June 2013 11:46:16PM 1 point [-]

Yes. And now, MIRI is planning a decision theory workshop (for September) so that some of this can be hashed out.

Comment author: Tyrrell_McAllister 20 June 2013 07:05:01PM 2 points [-]

UDT can be modeled with a Universe computation that takes no arguments.

Comment author: Wei_Dai 20 June 2013 06:53:16PM 2 points [-]

I think you must have been looking at someone else's idea. None of the versions of UDT that I've proposed are like this. See my original UDT post for the basic setup, which all of my subsequent proposals share.

Comment author: Eliezer_Yudkowsky 20 June 2013 07:27:03PM 1 point [-]

"The answer is, we can view the physical universe as a program that runs S as a subroutine, or more generally, view it as a mathematical object which has S embedded within it." A big computation with embedded discrete copies of S seems to me like a different concept from doing logical updates on a big graph with causal and logical nodes, some of which may correlate to you even if they are not exact copies of you.

Comment author: Wei_Dai 20 June 2013 08:27:56PM 6 points [-]

The sentence you quoted was just trying to explain how "physical consequences" might be interpreted as "logical consequences" and therefore dealt with within the UDT framework (which doesn't natively have a concept of "physical consequences"). It wasn't meant to suggest that UDT only works if there are discrete copies of S in the universe.

In that same post I also wrote, "A more general class of consequences might be called logical consequences. Consider a program P’ that doesn’t call S, but a different subroutine S’ that’s logically equivalent to S. In other words, S’ always produces the same output as S when given the same input. Due to the logical relationship between S and S’, your choice of output for S must also affect the subsequent execution of P’. Another example of a logical relationship is an S' which always returns the first bit of the output of S when given the same input, or one that returns the same output as S on some subset of inputs."

I guess I didn't explicitly write about parts of the universe that are "correlate to you" as opposed to having more exact logical relationships with you, but given how UDT is supposed to work, it was meant to just handle them naturally. At least I don't see why it wouldn't do so as well as TDT (assuming it had access to your "general-logical-consequence algorithm" which I'm guessing is the same thing as my "math intuition module").

Comment author: Benja 22 June 2013 05:11:34PM *  3 points [-]

FWIW, as far as I can remember I've always understood this the same way as Wei and cousin_it. (cousin_it was talking about the later logic-based work rather than Wei's original post, but that part of the idea is common between the two systems.) If the universe is a Game of Life automaton initialized with some simple configuration which, when run with unlimited resources and for a very long time, eventually by evolution and natural selection produces a structure that is logically equivalent to the agent's source code, that's sufficient for falling under the purview of the logic-based versions of UDT, and Wei's informal (underspecified) probabilistic version would not even require equivalence. There's nothing Cartesian about UDT.

Comment author: defectbot 26 June 2013 12:11:28AM 2 points [-]

UDT doesn't handle non-base-level maximization vantage points (previously "epistemic vantage points") for blackmail - you can blackmail a UDT agent because it assumes your strategy is fixed, and doesn't realize you're only blackmailing it because you're simulating it being blackmailable.

I'm not so sure about this one... It seems that UDT would be deciding "If blackmailed, pay or don't pay" without knowing whether it actually will be blackmailed yet. Assuming it knows the payoffs the other agent receives, it would reason "If a pay if blackmailed...I get blackmailed, whereas if I don't pay if blackmailed...I don't get blackmailed. I therefore should never pay if blackmailed", unless there's something I'm missing.