paulfchristiano comments on Would AIXI protect itself? - Less Wrong

8 Post author: Stuart_Armstrong 09 December 2011 12:29PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (19)

You are viewing a single comment's thread.

Comment author: paulfchristiano 10 December 2011 07:29:41AM *  1 point [-]

I don't quite understand why AIXI would protect its memory from modification, or at least not why it would reason as you describe (though I concede that I'm quite likely to be missing something).

In what sense can AIXI perceive a memory corruption as corresponding to the universe "changing"? For example, if I change the agent's memory so that one Grue never existed, it seems like it perceives no change: starting in the next step it believes that the universe has always been this way, and will restrict its attention to world models in which the universe has always been this way (not in which the memory alteration is causally connected to the destruction of a Grue, or indeed in which the memory alteration has any effect at all on the rest of the world).

It seems like your discussion presumes that a memory modification at time T somehow effects your memories of times near T, so that you can somehow causally associate the inconsistencies in your world model with the memory alteration itself.

I suppose you might hope for AIXI to learn a mapping between memory locations and sense perceptions, so that it can learn that the modification of memory location X leads causally to flipping the value of its input at time T (say). But this is not a model which AIXI can even support without learning to predict its own behavior (which I have just argued leads to nonsensical behavior), because the modification of the memory location occurs after time T, so generally depends on the AI's actions after time T, so is not allowed to have an effect on perceptions at time T.

Comment author: Stuart_Armstrong 10 December 2011 03:33:28PM 0 points [-]

I'm assuming a situation where we're not able to make a completely credible alteration. Maybe the AIXI's memories about the number of grues goes: 1 grue, 2 grues, 3 grues, 3 grues, 5 grues, 6 grues... and it knows of no mechanisms to produce two grues at once (in its "most likely" models) and other evidence in its memory is consistent with their being 4 grues, not three. So it can figure out there are particular odd moments where the universe seems to behave in odd ways, unlike most moments. And then it may figure out that these odd moments are correlated with human action.

Comment author: paulfchristiano 12 December 2011 07:26:21PM 0 points [-]

Why are these odd moments correlated with human action? I modify the memory at time 100, changing a memory of what happened at time 10. AIXI observes something happen at time 10, and then a memory modification at time 100. Perhaps AIXI can learn a mapping between memory locations and instants in time, but it can't model a change which reaches backwards in time (unless it learns a model in which the entire history of the universe is determined in advance, and just revealed sequentially, in which case it has learned a good enough self-model to stop caring about its own decisions).

Comment author: Stuart_Armstrong 13 December 2011 11:13:32AM 0 points [-]

I was suggesting that that if the time difference wasn't too large, the AIXI could deduce "humans plan at time 10 to press button" -> "weirdness at time 10 and button pressed at time 100". If it's good a modelling us, it may be able to deduce our plans long before we do, and as long as the plan predates the weirdness, it can model the plan as causal.

Or if it experiences more varied situations, it might deduce "no interactions with humans for long periods" -> "no weirdness", and act in consequence.

Comment author: hairyfigment 01 December 2013 07:42:16PM *  -1 points [-]

ETA: misunderstood the parent. So it might think our actions made a grue, and would enjoy being told horrible lies which it could disprove. Except I don't know how this interacts with Eliezer's point.