eli_sennesh comments on Debunking Fallacies in the Theory of AI Motivation - LessWrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (343)
No. Dependently-typed theorem proving is the only thing safe enough ;-). That, or the kind of probabilistic defense-in-depth that comes from specifying uncertainty about the goal system and other aspects of the agent's functioning, thus ensuring that updating on data will make the agent converge to the right thing.