You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

mavant comments on Predicted corrigibility: pareto improvements - Less Wrong Discussion

5 Post author: Stuart_Armstrong 18 August 2015 11:02AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (2)

You are viewing a single comment's thread.

Comment author: mavant 23 August 2015 07:46:14PM 0 points [-]

Third obvious possibility: B maximises u~Σpivi, subject to the constraints E(Σpivi|B) ≥ E(Σpivi|A) and E(u|B) ≥ E(u|A). where ~ is some simple combining operation like addition or multiplication, or "the product of A and B divided by the sum of A and B".

I think these possibilities all share the problem that the constraint makes it essentially impossible to choose any action other than what A would have chosen. If A chose the action that maximized u, then B cannot choose any other action while satisfying the constraint E(u|B) ≥ E(u|A) unless there were multiple actions that had the exact same payoff (which seems unlikely if payoff values are distributed over the reals, rather than over a finite set). And the first possibility (to maximize u while respecting E(Σpivi|B) ≥ E(Σpivi|A) ) just results in choosing the exact same action as A would have chosen, even if there's another action that has an identical E(u) AND higher E(Σpivi).

Comment author: Stuart_Armstrong 24 August 2015 10:29:04AM 0 points [-]

I think these possibilities all share the problem that the constraint makes it essentially impossible to choose any action other than what A would have chosen.

I see I've miscommunicated the central idea. Let U be the proposition "the agent will remain a u maximiser forever". Agent A acts as if P(U)=1 (see the entry on value learning). In reality, P(U) is probably very low. So A is a u-maximiser, but a u-maximiser that acts on false beliefs.

Agent B is is allowed to have a better estimate of P(U). Therefore it can find actions that increase u beyond what A would do.

Example: u values rubies deposited in the bank. A will just collect rubies until it can't carry them any more, then go deposit them in the bank. B, knowing that u will change to something else before A has finished collecting rubies, rushes to the bank ahead of that deadline. So E(u|B) > E(u|A).

And, of course, if B can strictly increase E(u), that gives it some slack to select other actions that can increase (Σpivi).