abramdemski comments on Toy model for wire-heading [EDIT: removed for improvement] - Less Wrong

2 Post author: Stuart_Armstrong 09 October 2015 03:45PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (1)

You are viewing a single comment's thread.

Comment author: abramdemski 09 October 2015 07:43:48PM 3 points [-]

I agree with your point as stated, but I think a sharper distinction between utility-maximizing and reward-maximizing reveals more alternatives.

A reward-maximizing agent attempts to predict A; D maximizes this predicted future A.

A utility-maximizing agent has direct access to A; D applies A to evaluate possible futures, and maximizes A.

In the first case, a superintelligent D would want to wrestle control of A and modify it.

In the second case, when D thinks about the planned modification of A, it evaluates this possible future using the current A. It sees that the current A does not value this future particularly highly. Therefore, it does not wirehead.