Death Note, Anonymity, and Information Theory

gwern

I don't know if this is a little too afar field for even a Discussion post, but people seemed to enjoy my previous articles (Girl Scouts financial filings, video game console insurance, philosophy of identity/abortion, & prediction market fees), so...

I recently wrote up an idea that has been bouncing around my head ever since I watched Death Note years ago - can we quantify Light Yagami's mistakes? Which mistake was the greatest? How could one do better? We can shed some light on the matter by examining DN with... basic information theory.

Presented for LessWrong's consideration: Death Note & Anonymity.

The classic is Parfit's hitch-hiker, where an agent capable of accurately predicting the AI's actions offers to give it something if and only if the AI will perform some specific action in future. A causal AI might be tempted to modify itself to desire that specific action, while a timeless AI will simply do the ting anyway without needing to self-modify.

That works if the AI knows that the other agent will keep its promise, and the other agent knows what the AI will do in the future. In particular the AI has to know the other agent is going to successfully anticipate what the AI will do in the future, even though the AI doesn't know itself. And the AI has to be able to infer all this from actual sensory experience, not by divine revelation. Hmm, I suppose that's possible.

Hmm, it's really easy to specify a causal AI, along the lines of AIXI but you can skip the arguments about it being near-optimal. Is there a similar simple spec of a timeless AI?

When I think through what the causal AI would do, it would be in a situation where it didn't know whether the actions it chooses are in the real world or in the other agent's simulation of the AI when the other agent is predicting what the AI would do. If it reasons correctly about this uncertainty, the causal AI might do the right thing anyway. I'll have to think about this. Thanks for the pointer.

Roughly, the importance is that there's only two kinds of truly catastrophic mistakes that an AI could make, mistakes which manage to wipe out to whole planet in one shot and errors in modifying its own code. Everything else can be recovered from.

It could build and deploy an unfriendly AI completely different from itself.

That works if the AI knows that the other agent will keep its promise, and the other agent knows what the AI will do in the future. In particular the AI has to know the other agent is going to successfully anticipate what the AI will do in the future, even though the AI doesn't know itself. And the AI has to be able to infer all this from actual sensory experience, not by divine revelation. Hmm, I suppose that's possible.

That's the thing about mathematical proofs, you need to conclusively rule out every possibility. When dealing with something like a su... (read more)

60

Death Note, Anonymity, and Information Theory

60

60

60

Death Note, Anonymity, and Information Theory

60

60