itaibn0 comments on The Problem with AIXI - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (78)
So it discovers that destroying a particular building in NY made NY look plain black and made its effectors in NY not do anything. It infers from available evidence that NY still exists and is behaving as normal in other regards. It discovers similar buildings in other cities that have the same effect. At this point it can infer that destroying the magic building in a given city will make that city look black and its effectors in that city not move.
But how does it care? How does it make the leap from "I will receive blank sensory input from this location" to "my goals are less likely to be fulfilled"? It might observe that its goals seem easier to achieve in cities where the magic building is still present, but it can't accurately model agents as complex as itself, and it's got no way to treat itself differently from another "ally" that seems to be helping the same cause. Which... I can't prove is irrational, but certainly seems a bit odd.
I think you just answered your own question. Indeed, if the agent found that destroying its instances does not lead to less of its goals being achieved, then even a "naturalized" reasoner should not particularly care about destroying itself entirely.
Now, you say the agent would treat instances of itself the same way it would treat an ally. There's a difference: An ally is someone who behaves in ways that benefit it, while an instance is something whose actions correlate with its output signal. The fact that it has a fine-grained control over instances of itself should lead it to treat itself differently from allies. But if the agent has an ally that completely reliably transmits to it true information and performs its requests, then yes, the agent should that ally the same way it treats parts of itself.
You can't win, Vader. If you strike me down, I shall become more powerful than you can possibly imagine.