Pfft comments on Cryptographic Boxes for Unfriendly AI - Less Wrong

24 Post author: paulfchristiano 18 December 2010 08:28AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (155)

You are viewing a single comment's thread. Show more comments above.

Comment author: Pfft 19 December 2010 08:55:05PM 4 points [-]

Yes, but the point is that the automatic verifier gets to verify a proof that the AI-in-the-box produced -- it doesn't have to examine an arbitrary program and try to proof friendliness from scratch.

In a comment below, paulfchristiano makes the point that any theory of friendliness at all should give us such a proof system, for some restricted class of programs. For example, Eliezer envisions a theory about how to let programs evolve without losing friendliness. The corresponding class of proofs have the form "the program under consideration can be derived from the known-friendly program X by the sequence Y of friendliness-preserving transformations".