mavant comments on Predicted corrigibility: pareto improvements - Less Wrong

5 Post author: Stuart_Armstrong 18 August 2015 11:02AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (2)

You are viewing a single comment's thread.

Comment author: mavant 23 August 2015 07:46:14PM 0 points [-]

Third obvious possibility: B maximises u~Σpivi, subject to the constraints E(Σpivi|B) ≥ E(Σpivi|A) and E(u|B) ≥ E(u|A). where ~ is some simple combining operation like addition or multiplication, or "the product of A and B divided by the sum of A and B".

I think these possibilities all share the problem that the constraint makes it essentially impossible to choose any action other than what A would have chosen. If A chose the action that maximized u, then B cannot choose any other action while satisfying the constraint E(u|B) ≥ E(u|A) unless there were multiple actions that had the exact same payoff (which seems unlikely if payoff values are distributed over the reals, rather than over a finite set). And the first possibility (to maximize u while respecting E(Σpivi|B) ≥ E(Σpivi|A) ) just results in choosing the exact same action as A would have chosen, even if there's another action that has an identical E(u) AND higher E(Σpivi).

Comment author: Stuart_Armstrong 24 August 2015 10:29:04AM 0 points [-]

I think these possibilities all share the problem that the constraint makes it essentially impossible to choose any action other than what A would have chosen.

I see I've miscommunicated the central idea. Let U be the proposition "the agent will remain a u maximiser forever". Agent A acts as if P(U)=1 (see the entry on value learning). In reality, P(U) is probably very low. So A is a u-maximiser, but a u-maximiser that acts on false beliefs.

Agent B is is allowed to have a better estimate of P(U). Therefore it can find actions that increase u beyond what A would do.

Example: u values rubies deposited in the bank. A will just collect rubies until it can't carry them any more, then go deposit them in the bank. B, knowing that u will change to something else before A has finished collecting rubies, rushes to the bank ahead of that deadline. So E(u|B) > E(u|A).

And, of course, if B can strictly increase E(u), that gives it some slack to select other actions that can increase (Σpivi).