You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

V_V comments on AI caught by a module that counterfactually doesn't exist - Less Wrong Discussion

9 Post author: Stuart_Armstrong 17 November 2014 05:49PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (22)

You are viewing a single comment's thread. Show more comments above.

Comment author: V_V 17 November 2014 07:09:42PM -1 points [-]

Isn't that essentially a false beliefs about one's own preferences?

I mean, the AI "true" VNM utility function, to the extent that it has one, is going to be different than the utility function the AI think reflectively it has. In principle the AI could find out the difference and this could cause it to alter its behavior.

Or maybe not, I don't have a strong intuition about this at the moment. But if I recall correctly, in the previous work on corrigibility (I didn't read the last version you linked yet), Soares was thinking of using causal decision nodes to implement utility indifference for the shutdown problem. This effectively introduces false beliefs into the agent, as the agent is mistaken about what causes the button to be pressed.

Comment author: So8res 17 November 2014 07:49:12PM *  4 points [-]

My preferred interpretation of that particular method is not "the agent has false beliefs," but instead "the agent cares both about the factual and the counterfactual worlds, and is trying to maximize utility in both at once." That is, if you were to cry

But if the humans press the button, the press signal will occur! So why are you acting such that you still get utility in the counterfactual world where humans press the button and the signal fails to occur?

It will look at you funny, and say "Because I care about that counterfactual world. See? It says so right here in my utility function." It knows the world is counterfactual, it just cares about "what would have happened" anyway. (Causal decision nodes are used to formalize "what would have happened" in the agent's preferences, this says nothing about whether the agent uses causal reasoning when making decisions.)

Comment author: Toggle 18 November 2014 12:12:51AM *  0 points [-]

This greatly clarified the distinction for me. Well done.

Comment author: V_V 17 November 2014 09:08:17PM 0 points [-]

(Causal decision nodes are used to formalize "what would have happened" in the agent's preferences, this says nothing about whether the agent uses causal reasoning when making decisions.)

Makes sense.

Comment author: Stuart_Armstrong 17 November 2014 08:51:33PM 2 points [-]

Isn't that essentially a false beliefs about one's own preferences?

No. It's an adjusted preference that functions in practice just like a false belief.