timtyler comments on Cryptographic Boxes for Unfriendly AI - Less Wrong

24 Post author: paulfchristiano 18 December 2010 08:28AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (155)

You are viewing a single comment's thread. Show more comments above.

Comment author: paulfchristiano 21 December 2010 03:37:41AM 2 points [-]

The point is that it may be possible to design a heuristically friendly AI which, if friendly, will remain just as friendly after changing itself, without having any infallible way to recognize a friendly AI (in particular, its bad if your screening has any false positives at all, since you have a transhuman looking for pathologically bad cases).

Comment author: timtyler 21 December 2010 06:46:17PM 2 points [-]

To recap, (b) was: 'Verification of "proof of friendliness" is easier than its production'.

For that to work as a plan in context, the verification doesn't have to be infallible. It just needs not to have false positives. False negatives are fine - i.e. if a good machine is rejected, that isn't the end of the world.