timtyler comments on Open Thread, August 2010 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (676)
A much stronger argument than all-powerful AIs suddenly escaping (which is still not without merit) is that AI will have an incentive to behave as we expect it to behave, until at some point we no longer control it. It'll try its best to pass all tests.
So: while it believes it is under evaluation it does its very best to behave itself?
Can we wire that belief in as a prior with p=1.0?