Vladimir_Nesov comments on Towards a New Decision Theory - Less Wrong

50 Post author: Wei_Dai 13 August 2009 05:31AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (142)

You are viewing a single comment's thread. Show more comments above.

Comment author: Vladimir_Nesov 16 August 2009 06:35:54PM *  2 points [-]

AI's creator was running BRAINS, not a decision theory. I don't see how "what the AI's creator was running" can be a meaningful consideration in a discussion of what constitutes a good AI design. Beware naturalistic fallacy.

Comment author: Wei_Dai 16 August 2009 06:39:10PM *  1 point [-]

One AI can create another AI, right? Does my conjecture make sense if the creator is an AI running some decision theory? If so, we can extend XDT to work with human creators, by having some procedure to approximate the human using a selection of possible DTs, priors, and utility functions. Remember that the goal in XDT is to minimize the probability that the creator would want to add an exception on top of the basic decision algorithm of the AI. If the approximation is close enough, then this probability is minimal.

ETA: I do not claim this is good AI design, merely trying to explore the implications of different ideas.

Comment author: Vladimir_Nesov 16 August 2009 07:26:37PM *  5 points [-]

The problem of finding the right decision theory is a problem of Friendliness, but for a different reason than finding a powerful inference algorithm fit for an AGI is a problem of Friendliness.

"Incompleteness" of decision theory, such as what we can see in CDT, seems to correspond to inability of AI to embody certain aspects of preference, in other words the algorithm lacks expressive power for its preference parameter. Each time an agent makes a mistake, you can reinterpret it as meaning that it just prefers it this way in this particular case. Whatever preference you "feed" to the AI with a wrong decision theory, the AI is going to distort by misinterpreting, losing some of its aspects. Furthermore, the lack of reflective consistency effectively means that the AI continues to distort its preference as it goes along. At the same time, it can still be powerful in consequentialist reasoning, being as formidable as a complete AGI, implementing the distorted version of preference that it can embody.

The resulting process can be interpreted as an AI running "ultimate" decision theory, but with a preference not in perfect fit with what it should've been. If at any stage you have a singleton that owns the game but has a distorted preference, whether due to incorrect procedure for getting the preference instantiated, or incorrect interpretation of preference, such as a mistaken decision theory as we see here, there is no returning to better preference.

More generally, what "could" be done, what AI "could" become, is a concept related to free will, which is a consideration of what happens to a system in isolation, not a system one with reality: you consider a system from the outside, and see what happens to it if you perform this or that operation on it, this is what it means that you could do one operation or the other, or that the events could unfold this way or the other. When you have a singleton, on the other hand, there is no external point of view on it, and so there is no possibility for change. The singleton is the new law of physics, a strategy proven true [*].

So, if you say that the AI's predecessor was running a limited decision theory, this is a damning statement about what sort of preference the next incarnation of AI can inherit. The only significant improvement (for the fate of preference) an AGI with any decision theory can make is to become reflectively consistent, to stop losing the ground. The resulting algorithm is as good as the ultimate decision theory, but with preference lacking some aspects, and thus behavior indistinguishable (equivalent) from what some other kinds of decision theories would produce.

__
[*] There is a fascinating interpretation of truth of logical formulas as the property of corresponding strategies in a certain game to be the winning ones. See for example
S. Abramsky (2007). `A Compositional Game Semantics for Multi-Agent Logics of Imperfect Information'. In J. van Benthem, D. Gabbay, & B. Lowe (eds.), Interactive Logic, vol. 1 of Texts in Logic and Games, pp. 11-48. Amsterdam University Press. (PDF)

Comment author: Eliezer_Yudkowsky 16 August 2009 09:50:02PM 4 points [-]

An AI running causal decision theory will lose on Newcomblike problems, be defected against in the Prisoner's Dilemma, and otherwise undergo behavior that is far more easily interpreted as "losing" than "having different preferences over final outcomes".

Comment author: Vladimir_Nesov 16 August 2009 10:39:43PM 3 points [-]

The AI that starts with CDT will immediately rewrite itself with AI running the ultimate decision theory, but that resulting AI will have distorted preferences, which is somewhat equivalent to the decision theory it runs having special cases for the time AI got rid of CDT (since code vs. data (algorithm vs. preference) is strictly speaking an arbitrary distinction). The resulting AI won't lose on these thought experiments, provided they don't intersect the peculiar distortion of its preferences, where it indeed would prefer to "lose" according to preference-as-it-should-have-been, but win according to its distorted preference.

Comment author: Eliezer_Yudkowsky 16 August 2009 10:42:11PM 4 points [-]

A TDT AI consistently acts so as to end up with a million dollars. A CDT AI acts to win a million dollars in some cases, but in other cases ends up with only a thousand. So in one case we have a compressed preference over outcomes, in the other case we have a "preference" over the exact details of the path including the decision algorithm itself. In a case like this I don't use the word "preference" so as to say that the CDT AI wants a thousand dollars on Newcomb's Problem, I just say the CDT AI is losing. I am unable to see any advantage to using the language otherwise - to say that the CDT AI wins with peculiar preference is to make "preference" and "win" so loose that we could use it to refer to the ripples in a water pond.

Comment author: Vladimir_Nesov 16 August 2009 11:12:53PM *  1 point [-]

It's the TDT AI resulting from CDT AI's rewriting of itself that plays these strange moves on the thought experiments, not CDC AI. The algorithm of idealized TDT is parameterized by "preference" and always gives the right answer according to that "preference". To stop reflective inconsistency, CDT AI is going to rewrite itself with something else. That something else can be characterized in general as a TDT AI with crazy preferences, that prefers $1000 in the Newcomb's thought experiments set before midnight October 15, 2060, or something of the sort, but works OK after that. The preference of TDT AI to which a given AGI is going to converge can be used as denotation of that AGI's preference, to generalize the notion of TDT preference on systems that are not even TDT AIs, and further to the systems that are not even AIs, in particular on humans or humanity.

These are paperclips of preference, something that seems clearly not right as a reflection of human preference, but that is nonetheless a point in the design space that can be filled in particular by failures to start with the right decision theory.

Comment author: Eliezer_Yudkowsky 16 August 2009 11:27:01PM 2 points [-]

I suggest that regarding crazy decision theories with compact preferences as sane decision theories with noncompact preferences is a step backward which will only confuse yourself and the readers. What is accomplished by doing so?

Comment author: Vladimir_Nesov 16 August 2009 11:41:59PM *  1 point [-]

How to regard humans then? They certainly don't run a compact decision algorithm, their actions are not particularly telling of their preferences. And still, they have to be regarded as having a TDT preference, to extract that preference and place it in a TDT AI. As I envision a theory that would define what TDT preference humans have, it must also be capable of telling what is the TDT preference of crazy AIs or petunia or the Sun.

(Btw, I'm now not sure that CDT-generated AI will give crazy answers on questions about the past, it may just become indifferent to the past altogether, as that part of preference is already erased from its mind. CDT gave crazy answers, but when it constructed the TDT, it already lost the part of preference that corresponds to giving those crazy answers, and so the TDT won't give them.)

Comment author: Eliezer_Yudkowsky 17 August 2009 12:11:56AM *  2 points [-]

If you regard humans as sane EU maximizers with crazy preferences then you end up extracting crazy preferences! This is exactly the wrong thing to do.

I can't make out what you're saying about CDT-gen AI because I don't understand this talk about "that part of preference is already erased from its mind". You might be better off visualizing Dai's GLT, which a "half timeless decision theory" is just the compact generator of.

Comment author: Wei_Dai 16 August 2009 10:00:43PM *  0 points [-]

I think an AI running CDT would immediately replace itself by an AI running XDT (or something equivalent to it). If there is no way to distinguish between an AI running XDT and an AI running TDT (prior to a one-shot PD), the XDT AI can't do worse than an TDT AI. So CDT is not losing, as far as I can tell (at least for an AI capable of self-modification).

ETA: I mean a XTD AI can't do worse than a TDT AI within the same world. But a world full of XTD will do worse than a world full of TDT.