Death Note, Anonymity, and Information Theory

gwern

I don't know if this is a little too afar field for even a Discussion post, but people seemed to enjoy my previous articles (Girl Scouts financial filings, video game console insurance, philosophy of identity/abortion, & prediction market fees), so...

I recently wrote up an idea that has been bouncing around my head ever since I watched Death Note years ago - can we quantify Light Yagami's mistakes? Which mistake was the greatest? How could one do better? We can shed some light on the matter by examining DN with... basic information theory.

Presented for LessWrong's consideration: Death Note & Anonymity.

I should probably have specified that building another agent doesn't really count as self modification if the other agent is identical to the original (or maybe it does count as self modification, but in a very vacuous sense, the same way 'do nothing' is technically an algorithm). So if the other agent is CDT this is not a counter-example.

If the other agent is a more primitive approximation to a CDT then I would view constructing it not as self-modification, but simply as making a choice in an action-determined problem.

If the other agent is TDT or UDT or something then this may count as self-modification, but there is no need to make it this way.

Suppose we use the rigorous definition where an action-determined problem is just a list of choices, each of which leads to a probability distribution across possible outcomes, each of which has a utility assigned to it. In this case I think it is clear that "An ideal CDT agent that anticipates facing only action-determined problems will always choose not to self modify" is true while "An ideal CDT agent that anticipates facing only action-determined problems will always choose not to do anything" is false.

Suppose we use the rigorous definition where an action-determined problem is just a list of choices, each of which leads to a probability distribution across possible outcomes, each of which has a utility assigned to it. In this case I think it is clear that "An ideal CDT agent that anticipates facing only action-determined problems will always choose not to self modify" is true

That's plausible, but my counterexample still holds, apparently. I'm sure the desired theorem is true under the right hypotheses, but I can't quite guess what they are... (read more)

60

Death Note, Anonymity, and Information Theory

60

60

60

Death Note, Anonymity, and Information Theory

60

60