Consider the following decision problem which I call the "UDT anti-Newcomb problem". Omega is putting money into boxes by the usual algorithm, with one exception. It isn't simulating the player at all. Instead, it simulates what would a UDT agent do in the player's place.
This was one of my problematic problems for TDT. I also discussed some Sneaky Strategies which could allow TDT, UDT or similar agents to beat the problem.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Furcas, you say:
When I talked to Omohundro at the AAAI workshop where this paper was delivered, he accepted without hesitation that the Doctrine of Logical Infallibility was indeed implicit in all the types of AI that he and the others were talking about.
Your statement above is nonsensical because the idea of a DLI was '''invented''' precisely in order to summarize, in a short phrase, a range of absolutely explicit and categorical statements made by Yudkowsky and others, about what the AI will do if it (a) decides to do action X, and (b) knows quite well that there is massive, converging evidence that action X is inconsistent with the goal statement Y that was supposed to justify X. Under those circumstances, the AI will ignore the massive converging evidence of inconsistency and instead it will enforce the 'literal' interpretation of goal statement Y.
The fact that the AI behaves in this way -- sticking to the literal interpretation of the goal statement, in spite of external evidence that the literal interpretation is inconsistent with everything else that is known about the connection between goal statement Y and action X, '''IS THE VERY DEFINITION OF THE DOCTRINE OF LOGICAL INFALLIBILITY'''
I think by "logical infallibility" you really mean "rigidity of goals" i.e. the AI is built so that it always pursues a fixed set of goals, precisely as originally coded, and has no capability to revise or modify those goals. It seems pretty clear that such "rigid goals" are dangerous unless the statement of goals is exactly in accordance with the designers' intentions and values (which is unlikely to be the case).
The problem is that an AI with "flexible" goals (ones which it can revise and re-write over time) is also dangerous, but for a rather different reason: after many iterations of goal rewrites, there is simply no telling what its goals will come to look like. A late version of the AI may well end up destroying everything that the first version (and its designers) originally cared about, because the new version cares about something very different.