Stuart_Armstrong comments on Introducing Corrigibility (an FAI research subfield) - LessWrong

29 Post author: So8res 20 October 2014 09:09PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (28)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 23 January 2015 11:31:51AM 1 point [-]

The intuitive fix is to try to prevent P from being the causal ancestor of anything in the graph; e.g., have the agent act as if it doesn't believe that the blackmailer can really observe / base their action on P. That sounds really difficult to set up and horribly hacky, though.

It is relevant that the decision to blackmail (probably need a better word) is determined by the fact that P=not Press, and because of the particular structure of the algorithm. This flags up the blackmail as something unusual, but I'm not sure how to safely exploit that fact... The rule "don't take deals that only exist because of property Q of your algorithm" is too rigid, but maybe a probabilistic version of that?