Stuart_Armstrong comments on Predicted corrigibility: pareto improvements - Less Wrong

5 Post author: Stuart_Armstrong 18 August 2015 11:02AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (2)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 24 August 2015 10:29:04AM 0 points [-]

I think these possibilities all share the problem that the constraint makes it essentially impossible to choose any action other than what A would have chosen.

I see I've miscommunicated the central idea. Let U be the proposition "the agent will remain a u maximiser forever". Agent A acts as if P(U)=1 (see the entry on value learning). In reality, P(U) is probably very low. So A is a u-maximiser, but a u-maximiser that acts on false beliefs.

Agent B is is allowed to have a better estimate of P(U). Therefore it can find actions that increase u beyond what A would do.

Example: u values rubies deposited in the bank. A will just collect rubies until it can't carry them any more, then go deposit them in the bank. B, knowing that u will change to something else before A has finished collecting rubies, rushes to the bank ahead of that deadline. So E(u|B) > E(u|A).

And, of course, if B can strictly increase E(u), that gives it some slack to select other actions that can increase (Σpivi).