William_S comments on Superintelligence 13: Capability control methods - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (44)
This has some problems associated with stunting. Adding humans in the loop with this frequency of oversight will slow things down, whatever happens. The AI would also have fewer problem solving strategies open to it - that is if doesn't care about thinking ahead to <do evil things>, it also won't think ahead to <do things that make future optimizations easier>.
The programmers also have to make sure that they inspect not only the output of the AI at this stage, but the strategies it is considering implementing. Otherwise, it's possible that there is a sudden transition where one strategy only works up until a certain point, then another more general strategy takes over.