Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Death Note, Anonymity, and Information Theory

28 Post author: gwern 08 May 2011 03:44PM

I don't know if this is a little too afar field for even a Discussion post, but people seemed to enjoy my previous articles (Girl Scouts financial filings, video game console insurance, philosophy of identity/abortion, & prediction market fees), so...

I recently wrote up an idea that has been bouncing around my head ever since I watched Death Note years ago - can we quantify Light Yagami's mistakes? Which mistake was the greatest? How could one do better? We can shed some light on the matter by examining DN with... basic information theory.

Presented for LessWrong's consideration: Death Note & Anonymity.

Comments (47)

Comment author: [deleted] 09 May 2011 01:16:05AM 13 points [-]

Nice analysis! I really like the way you quantified it.

This is off-topic, but I think Death Note is practically begging to be re-written as rationalist fanfiction: Light's manipulation skills could be used to discuss psychology and cognitive biases (much like Draco Malfoy in HP:MOR). L would of course be a Bayesian rationalist, and Soichiro and Izowa could be Traditional Rationalist foils who would allow L to explain the ins and outs of high-level rationality. (As you've shown here, information theory could play a larger role in L's investigation.) The cat-and-mouse games between L and Light could be turned into decision theory problems; the rules related to ownership of the notebook could be used to explore timeless reasoning (much like the film Memento). The story is already brimming with ethical questions, and both L and Light's internal monologues could be used to discuss consequentialism and utilitarianism. I'm not sure what could be done with the rest of the characters or how the supernatural aspect would be handled, but it would probably be an interesting read.

Comment author: TimFreeman 09 May 2011 03:06:42AM 3 points [-]

The cat-and-mouse games between L and Light could be turned into decision theory problems; the rules related to ownership of the notebook could be used to explore timeless reasoning (much like the film Memento).

Eliezer's Timless Decision Theory is interesting, but I don't yet understand what real problem it is solving. He worked it into his Harry Potter fanfic when Harry was dealing with Azkaban (sp?), and given that he wrote the paper I can see why he made use of it, but here's someone else saying it's useful.

Does timeless reasoning have any application in situations where your opponents can't read your mind?

(I haven't watched Death Note, so my apologies if the answer to this question is obvious to people who have.)

Comment author: Vladimir_Nesov 09 May 2011 04:46:36PM *  8 points [-]

Does timeless reasoning have any application in situations where your opponents can't read your mind?

We all read each other's minds to some extent, and to the extent this happens, TDT will give better advice than CDT. See section 7 of the TDT paper:

"Modeling agents as influenced to some greater or lesser degree by "the sort of decision you make, being the person that you are", realistically describes present-day human existence."

Comment author: benelliott 09 May 2011 08:53:45AM *  2 points [-]

One reason reason is that it seems like it might be helpful with friendliness proofs, particularly the part where you have to prove the AI's goal will remain stable over millions of self-modifications (the harder, and all too frequently ignored, side of the problem). Basically, it takes dilemma's which might otherwise tempt an AI to self-modify, and shows that it need not have to.

I think with CDT you can prove an AI won't need to modify its goal system on action-determined problem, while with TDT you can prove the same for the broader class of decision-determined problems. This leaves many issues, but its a step in the right direction.

Disclaimer: The above post should not be taken to speak for Eliezer Yudkowsky, SIAI, or anyone other than me. I am not in any way a member of SIAI or any other similar organization. There is a good chance that I am talking out of my arse.

Comment author: Vladimir_Nesov 09 May 2011 04:34:25PM *  0 points [-]

to prove the AI's goal will remain stable over millions of self-modifications (the harder, and all too frequently ignored, side of the problem)

What's the easier side?

Comment author: benelliott 09 May 2011 04:45:03PM *  1 point [-]

Figuring out what the goal should be (note, I said easier, not easy). You probably know more than I do but the way I see it the whole thing breaks down into a philosophy problem and a maths problem. Most people find philosophy more fun than maths so spend all their time debating the former.

Comment author: TimFreeman 09 May 2011 01:38:13PM 0 points [-]

I'm not clear on the action-determined vs. decision-determined distinction. Can you give an example of a dilemma that might tempt an AI to self-modify if we didn't build it around TDT?

In general, I'm nervous around arguments that mention self-modification. If self-modification is a risk, then engineering in general is a risk, and self-modification is a special case of engineering. So IMO an argument about Friendliness that mentions self-modification immediately needs to be generalized to talk about engineering instead. Self-modification as a fundamental concept is therefore a useless distraction.

Comment author: benelliott 09 May 2011 04:10:16PM *  2 points [-]

The classic is Parfit's hitch-hiker, where an agent capable of accurately predicting the AI's actions offers to give it something if and only if the AI will perform some specific action in future. A causal AI might be tempted to modify itself to desire that specific action, while a timeless AI will simply do the thing anyway without needing to self-modify.

As for your second problem, Yudkowsky himself explains much better than I could why self-modification is important in the 3rd question of this interview.

Roughly, the importance is that there's only two kinds of truly catastrophic mistakes that an AI could make, mistakes which manage to wipe out to whole planet in one shot and errors in modifying its own code. Everything else can be recovered from.

Comment author: TimFreeman 09 May 2011 04:50:29PM 1 point [-]

The classic is Parfit's hitch-hiker, where an agent capable of accurately predicting the AI's actions offers to give it something if and only if the AI will perform some specific action in future. A causal AI might be tempted to modify itself to desire that specific action, while a timeless AI will simply do the ting anyway without needing to self-modify.

That works if the AI knows that the other agent will keep its promise, and the other agent knows what the AI will do in the future. In particular the AI has to know the other agent is going to successfully anticipate what the AI will do in the future, even though the AI doesn't know itself. And the AI has to be able to infer all this from actual sensory experience, not by divine revelation. Hmm, I suppose that's possible.

Hmm, it's really easy to specify a causal AI, along the lines of AIXI but you can skip the arguments about it being near-optimal. Is there a similar simple spec of a timeless AI?

When I think through what the causal AI would do, it would be in a situation where it didn't know whether the actions it chooses are in the real world or in the other agent's simulation of the AI when the other agent is predicting what the AI would do. If it reasons correctly about this uncertainty, the causal AI might do the right thing anyway. I'll have to think about this. Thanks for the pointer.

Roughly, the importance is that there's only two kinds of truly catastrophic mistakes that an AI could make, mistakes which manage to wipe out to whole planet in one shot and errors in modifying its own code. Everything else can be recovered from.

It could build and deploy an unfriendly AI completely different from itself.

Comment author: benelliott 09 May 2011 05:14:03PM *  2 points [-]

That works if the AI knows that the other agent will keep its promise, and the other agent knows what the AI will do in the future. In particular the AI has to know the other agent is going to successfully anticipate what the AI will do in the future, even though the AI doesn't know itself. And the AI has to be able to infer all this from actual sensory experience, not by divine revelation. Hmm, I suppose that's possible.

That's the thing about mathematical proofs, you need to conclusively rule out every possibility. When dealing with something like a super-intelligence there will be unforeseen circumstances, and nothing short of full mathematical rigour will save you.

Hmm, it's really easy to specify a causal AI, along the lines of AIXI but you can skip the arguments about it being near-optimal. Is there a similar simple spec of a timeless AI

I don't know of one off-hand, but I think AIXI can easily be made Timeless. Just modify the bit which says roughly "calculate a probability distribution over all possible outcomes for each possible action" and replace it with "calculate a probability distribution over all possible outcomes for each possible decision".

This may be worth looking into further, I havn't looked very deeply into the literature around AIXI.

When I think through what the causal AI would do, it would be in a situation where it didn't know whether the actions it chooses are in the real world or in the other agent's simulation of the AI when the other agent is predicting what the AI would do. If it reasons correctly about this uncertainty, the causal AI might do the right thing anyway. I'll have to think about this. Thanks for the pointer.

This looks like you might be stumbling towards Updateless Decision Theory, which is IMHO even stronger than TDT and may solve an even wider range of problems.

It could build and deploy an unfriendly AI completely different from itself.

I could come up with an argument for this falling into either category.

Comment author: TimFreeman 10 May 2011 03:44:00PM *  0 points [-]

Roughly, the importance is that there's only two kinds of truly catastrophic mistakes that an AI could make, mistakes which manage to wipe out to whole planet in one shot and errors in modifying its own code. Everything else can be recovered from.

It could build and deploy an unfriendly AI completely different from itself.

I could come up with an argument for this falling into either category.

I'm claiming that the concept of self-modification is useless since it's a special case of engineering. We have to get engineering right, and if we do that, we'll get self-modification right. I'm struggling to interpret your statement so it bears on my claim. Perhaps you agree with me? Perhaps you're ignoring my claim? You don't seem to be arguing against it.

The scenario I proposed (creating a new UFAI from scratch) doesn't fit well into the second category (self-modification) because I didn't say the original AI goes away. After the misbegotten creation of the UFAI, you have two, the original failed FAI and the new UFAI.

Actually, the second category (bad self-modification) seems to fit well into the first category (destroying the planet in one go), so these two categories don't support the idea that self-modification is a useful concept.

Comment author: benelliott 10 May 2011 03:52:00PM 1 point [-]

Okay, I think I see what you mean about engineering and self-modification, but I don't think its particularly important, it appears you're thinking in terms of two concepts:

Self-modification: Anything the AI does to itself, for a fairly strict definition of 'itself', as in, the same physical object' or something like that.

Engineering: Building any kind of machine.

However, I think that when most FAI researchers talk about 'self-modification' they mean something broader than your definition, which would include building another AI of roughly equal or greater power but would not include building a toaster.

Any mathematical conclusions drawn about self-modification should apply just as well to any possible method of doing so, and one such method is to construct another AI. Therefore constructing a UFAI falls into the category of 'self modification error' in the sense that it is the sort of thing TDT is designed to help prevent.

Comment author: TimFreeman 12 May 2011 06:21:36PM -1 points [-]

I think that when most FAI researchers talk about 'self-modification' they mean something broader than your definition, which would include building another AI of roughly equal or greater power but would not include building a toaster.

Sorry, I don't believe you. I've been paying attention to FAI people for some time and never heard "self-modification" used to include situations where the machine performing the "self-modification" does not modify itself. If someone actually took the initiative to define "self-modification" the way you say, I'd perceive them as being deliberately deceptive.

Comment author: Cyan 09 May 2011 04:36:40AM 3 points [-]

Does timeless reasoning have any application in situations where your opponents can't read your mind?

One individual used timeless reasoning to lose 100 pounds.

Comment author: [deleted] 09 May 2011 03:19:35AM 1 point [-]

Does timeless reasoning have any application in situations where your opponents can't read your mind?

It does give different answers for problems like the Prisoner's Dilemma when your opponent is similar enough to you that they will make the same decisions. As you mentioned, it makes an appearance in HP:MoR for similar reasons. There's no obvious application to Death Note, but I think it could certainly be incorporated somehow. If you've seen the film Memento, you might have some idea of what I mean. (I don't want to spoil Death Note because it really is an excellent anime series, so I'm not going to say exactly what I was thinking.) TDT is certainly not essential to rationality but it is very interesting, so it might be worth including in a Death Note re-write for that reason alone.

Comment author: Daniel_Burfoot 08 May 2011 05:40:17PM 9 points [-]

Arguably, the biggest mistake Light made was one of abstract strategy: he started using the Death Note almost immediately after obtaining it. He should have spent many years testing the thing, pondering its implications, studying police work, etc, before putting his plan into action.

Comment author: gwern 08 May 2011 05:50:40PM 16 points [-]

I can't help but think that that represents a serious privileging of the hypothesis - given a little black notebook claiming such absurd powers, you shouldn't carefully devise 20 different studies which try to falsify your various theories and inferences about its powers & limitations.

Unless you mean that after he verified that the Death Note did in fact kill supernaturally as claimed (after the biker and hostage-taker, I suppose), he should have gone into scientist mode?

In that case, my first thought is that from Light's perspective, delay is massive waste (all those dead people murdered by people who should be dead, eg.) and he thought he could handle any challenges that came his way. Which he was almost right about, after all.

Comment author: nazgulnarsil 12 May 2011 12:06:43AM 0 points [-]

not as big a waste as getting caught. given the power to change the world one should carefully think about how this power could be taken away before you start doing low utility things like eliminating criminals.

Comment author: benelliott 08 May 2011 05:22:10PM *  8 points [-]

Big DN fan, my thoughts:

1) Only a mistake if you consider his goal to be "kill as many people as possible" rather than "reduce crime as much as possible", and for the latter the small loss of anonymity may well be a justified sacrifice for the deterrent effect he could achieve by exposing his own existence. Especially since, as you point out, he might well have been discovered anyway.

2) Yep, pretty big mistake there.

3) I think you slightly under-rate this one, by not considering that L can't always eliminate people with certainty, prior to this it would have been possible that Kira was not Japanese but was timing his kills to make it look like he was to lead the police awry. This test made that hypothesis a lot less likely.

4) Agreed, this is the big screw-up, also probably the one that most of the viewers could have been expected to spot.

5) Bear in mind he was actually quite careful to prevent Penbar from being singled out, although he could have done better by delaying all the killings for a week or so. Misora would have narrowed down his anonymity even more had she not been killed.

For his optimal strategy, might he not have been even better off by deliberately sending misleading information, by timing the killings to indicate he lived somewhere else for example? After all, applying your strategy might well narrow it down to 'people who know information theory' which probably costs quite a few bits.

Comment author: hairyfigment 08 May 2011 08:21:32PM 0 points [-]

I more or less agree with you on point 1. A rational person could have reasoned in that way. But I think we have to say that Light did not. He wanted people to recognize his work when it came to killing apparent criminals because he wanted admiration as a goal in itself. This led to the most obviously avoidable mistake, #3.

Comment author: benelliott 08 May 2011 09:20:36PM 6 points [-]

I disagree, even in the very first episode he specifically outlines that part of his plan is that when people notice criminals are dying they will be less inclined to become criminals.

I wouldn't say #3 was that easily avoidable, I didn't see it coming myself, while in #4 it was all I could do to restrain myself from yelling 'idiot!' at the screen.

Comment author: hairyfigment 09 May 2011 02:57:55AM 0 points [-]

Yes, I believe that on the level of explicit reasoning he wants to kill criminals with heart attacks to deter crime (and use deaths of other kinds to secretly dispose of people who he thinks don't contribute). Then he gets agitated and kills Lind with a heart attack before verifying that he needs to kill Lind at all. This supports the theory that (like any other fascist dictator) he wants admiration and obedience more than a better world.

Comment author: [deleted] 22 December 2011 06:40:39PM 0 points [-]

I like Death Note, but I found "Liar Game" to be more realistic - at least I personally learned more psychology from it. What do you guys think?

Comment author: humpolec 12 May 2011 02:47:12PM *  0 points [-]

But... but... Light actually won, didn't he? At least in the short run - he managed to defeat L. I was always under the impression that some of these "mistakes" were committed by Light deliberately in order to lure L.

Comment author: gwern 22 May 2011 08:37:47PM 0 points [-]

You think Light won? Gosh, you need to read my other essay then, Death Note Ending and especially the final section, http://www.gwern.net/Death%20Note%20Ending#who-won

Comment author: Vaniver 09 May 2011 03:39:26PM *  0 points [-]

When you talk about the number of bits of anonymity he has once it's been narrowed down to Kanto, shouldn't that be the male population of Kanto?

Edit: The section about comparing mistakes also seems somewhat contradictory; first you talk about the number of people excluded (and so the first bit is, by definition, the most valuable) and then by the number of bits (and so the 11 bit mistake is more important than the 1.6 bit mistake). It may help to resolve the tension between the two approaches more explicitly.

Comment author: gwern 22 May 2011 08:41:19PM *  1 point [-]

When you talk about the number of bits of anonymity he has once it's been narrowed down to Kanto, shouldn't that be the male population of Kanto?

Yes, you're right - I used the total population of Kanto, not the total male population. I should probably rejigger those numbers.

EDIT: OK, I think I fixed that specific error. Fortunately, the mistake had only contaminated a few numbers... I think. Please tell me if I've accidentally introduced additional inconsistencies!

It may help to resolve the tension between the two approaches more explicitly.

I believe I did do this before your comment, in mistake 3 where I discuss what the logarithmic scale buys us.

Comment author: benelliott 09 May 2011 04:18:48PM 1 point [-]

In general, it should take L about the same amount of work, in a Bayesian sense, to gather one more bit of information regardless of how many he currently has. Thus, quantifying Light's mistakes in terms of bits conceded is probably the best way to do it.

Comment author: CronoDAS 08 May 2011 11:51:41PM *  0 points [-]

Have you seen the live-action movie version of Death Note? The pair of two-hour movies cover roughly the first season of the anime, but they have a different ending, one inspired by a popular fan theory...

Comment author: gwern 22 May 2011 08:42:16PM 1 point [-]

I watched one of the Death Note movies, but I really can't remember anything about them except L killing himself with a delayed Death Note sentence, or something like that, and how horrible the CGI Ryuk looked.

Comment author: nazgulnarsil 12 May 2011 12:07:21AM 0 points [-]

are they worth watching quality wise?

Comment author: CronoDAS 12 May 2011 01:28:31AM 0 points [-]

I liked them; I've never read the manga or watched the anime, so I can't say which version is best.