Tetronian comments on Death Note, Anonymity, and Information Theory - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (47)
Nice analysis! I really like the way you quantified it.
This is off-topic, but I think Death Note is practically begging to be re-written as rationalist fanfiction: Light's manipulation skills could be used to discuss psychology and cognitive biases (much like Draco Malfoy in HP:MOR). L would of course be a Bayesian rationalist, and Soichiro and Izowa could be Traditional Rationalist foils who would allow L to explain the ins and outs of high-level rationality. (As you've shown here, information theory could play a larger role in L's investigation.) The cat-and-mouse games between L and Light could be turned into decision theory problems; the rules related to ownership of the notebook could be used to explore timeless reasoning (much like the film Memento). The story is already brimming with ethical questions, and both L and Light's internal monologues could be used to discuss consequentialism and utilitarianism. I'm not sure what could be done with the rest of the characters or how the supernatural aspect would be handled, but it would probably be an interesting read.
Eliezer's Timless Decision Theory is interesting, but I don't yet understand what real problem it is solving. He worked it into his Harry Potter fanfic when Harry was dealing with Azkaban (sp?), and given that he wrote the paper I can see why he made use of it, but here's someone else saying it's useful.
Does timeless reasoning have any application in situations where your opponents can't read your mind?
(I haven't watched Death Note, so my apologies if the answer to this question is obvious to people who have.)
We all read each other's minds to some extent, and to the extent this happens, TDT will give better advice than CDT. See section 7 of the TDT paper:
One reason reason is that it seems like it might be helpful with friendliness proofs, particularly the part where you have to prove the AI's goal will remain stable over millions of self-modifications (the harder, and all too frequently ignored, side of the problem). Basically, it takes dilemma's which might otherwise tempt an AI to self-modify, and shows that it need not have to.
I think with CDT you can prove an AI won't need to modify its goal system on action-determined problem, while with TDT you can prove the same for the broader class of decision-determined problems. This leaves many issues, but its a step in the right direction.
Disclaimer: The above post should not be taken to speak for Eliezer Yudkowsky, SIAI, or anyone other than me. I am not in any way a member of SIAI or any other similar organization. There is a good chance that I am talking out of my arse.
What's the easier side?
Figuring out what the goal should be (note, I said easier, not easy). You probably know more than I do but the way I see it the whole thing breaks down into a philosophy problem and a maths problem. Most people find philosophy more fun than maths so spend all their time debating the former.
I'm not clear on the action-determined vs. decision-determined distinction. Can you give an example of a dilemma that might tempt an AI to self-modify if we didn't build it around TDT?
In general, I'm nervous around arguments that mention self-modification. If self-modification is a risk, then engineering in general is a risk, and self-modification is a special case of engineering. So IMO an argument about Friendliness that mentions self-modification immediately needs to be generalized to talk about engineering instead. Self-modification as a fundamental concept is therefore a useless distraction.
The classic is Parfit's hitch-hiker, where an agent capable of accurately predicting the AI's actions offers to give it something if and only if the AI will perform some specific action in future. A causal AI might be tempted to modify itself to desire that specific action, while a timeless AI will simply do the thing anyway without needing to self-modify.
As for your second problem, Yudkowsky himself explains much better than I could why self-modification is important in the 3rd question of this interview.
Roughly, the importance is that there's only two kinds of truly catastrophic mistakes that an AI could make, mistakes which manage to wipe out to whole planet in one shot and errors in modifying its own code. Everything else can be recovered from.
That works if the AI knows that the other agent will keep its promise, and the other agent knows what the AI will do in the future. In particular the AI has to know the other agent is going to successfully anticipate what the AI will do in the future, even though the AI doesn't know itself. And the AI has to be able to infer all this from actual sensory experience, not by divine revelation. Hmm, I suppose that's possible.
Hmm, it's really easy to specify a causal AI, along the lines of AIXI but you can skip the arguments about it being near-optimal. Is there a similar simple spec of a timeless AI?
When I think through what the causal AI would do, it would be in a situation where it didn't know whether the actions it chooses are in the real world or in the other agent's simulation of the AI when the other agent is predicting what the AI would do. If it reasons correctly about this uncertainty, the causal AI might do the right thing anyway. I'll have to think about this. Thanks for the pointer.
It could build and deploy an unfriendly AI completely different from itself.
That's the thing about mathematical proofs, you need to conclusively rule out every possibility. When dealing with something like a super-intelligence there will be unforeseen circumstances, and nothing short of full mathematical rigour will save you.
I don't know of one off-hand, but I think AIXI can easily be made Timeless. Just modify the bit which says roughly "calculate a probability distribution over all possible outcomes for each possible action" and replace it with "calculate a probability distribution over all possible outcomes for each possible decision".
This may be worth looking into further, I havn't looked very deeply into the literature around AIXI.
This looks like you might be stumbling towards Updateless Decision Theory, which is IMHO even stronger than TDT and may solve an even wider range of problems.
I could come up with an argument for this falling into either category.
I'm claiming that the concept of self-modification is useless since it's a special case of engineering. We have to get engineering right, and if we do that, we'll get self-modification right. I'm struggling to interpret your statement so it bears on my claim. Perhaps you agree with me? Perhaps you're ignoring my claim? You don't seem to be arguing against it.
The scenario I proposed (creating a new UFAI from scratch) doesn't fit well into the second category (self-modification) because I didn't say the original AI goes away. After the misbegotten creation of the UFAI, you have two, the original failed FAI and the new UFAI.
Actually, the second category (bad self-modification) seems to fit well into the first category (destroying the planet in one go), so these two categories don't support the idea that self-modification is a useful concept.
Okay, I think I see what you mean about engineering and self-modification, but I don't think its particularly important, it appears you're thinking in terms of two concepts:
Self-modification: Anything the AI does to itself, for a fairly strict definition of 'itself', as in, the same physical object' or something like that.
Engineering: Building any kind of machine.
However, I think that when most FAI researchers talk about 'self-modification' they mean something broader than your definition, which would include building another AI of roughly equal or greater power but would not include building a toaster.
Any mathematical conclusions drawn about self-modification should apply just as well to any possible method of doing so, and one such method is to construct another AI. Therefore constructing a UFAI falls into the category of 'self modification error' in the sense that it is the sort of thing TDT is designed to help prevent.
Sorry, I don't believe you. I've been paying attention to FAI people for some time and never heard "self-modification" used to include situations where the machine performing the "self-modification" does not modify itself. If someone actually took the initiative to define "self-modification" the way you say, I'd perceive them as being deliberately deceptive.
One individual used timeless reasoning to lose 100 pounds.
It does give different answers for problems like the Prisoner's Dilemma when your opponent is similar enough to you that they will make the same decisions. As you mentioned, it makes an appearance in HP:MoR for similar reasons. There's no obvious application to Death Note, but I think it could certainly be incorporated somehow. If you've seen the film Memento, you might have some idea of what I mean. (I don't want to spoil Death Note because it really is an excellent anime series, so I'm not going to say exactly what I was thinking.) TDT is certainly not essential to rationality but it is very interesting, so it might be worth including in a Death Note re-write for that reason alone.