William_S comments on Superintelligence 13: Capability control methods - Less Wrong

7 Post author: KatjaGrace 09 December 2014 02:00AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (44)

You are viewing a single comment's thread. Show more comments above.

Comment author: William_S 10 December 2014 03:01:22AM 2 points [-]

Suppose that you have a simple, benign solution that works only up to Y% optimization (just make the paperclips), and a hard, non-benign solution that is optimal above that point (take over the world, then make paperclips). The AI naively follows the benign strategy, and does not look too hard for alternatives up to Y%. All manual checks below Y% of optimization pass. But Y ends up as a number that falls between two of your numerical checkpoints. So, you observe all checkpoints passing below Y% optimization, until suddenly the AI switches to the non-benign solution between checkpoints, executes it to reach the next checkpoint, but has already caused damage.