Pfft comments on Cryptographic Boxes for Unfriendly AI - Less Wrong

24 Post author: paulfchristiano 18 December 2010 08:28AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (155)

You are viewing a single comment's thread. Show more comments above.

Comment author: JamesAndrix 19 December 2010 05:44:14PM 1 point [-]

That would then be something you'd have to read and likely show to dozens of other people to verify reliably, leaving opportunities for all kinds of mindhacks. the OP proposal requires us to have an automatic verifier ready to run, that can return reliably without human intervention.

Comment author: Pfft 19 December 2010 08:55:05PM 4 points [-]

Yes, but the point is that the automatic verifier gets to verify a proof that the AI-in-the-box produced -- it doesn't have to examine an arbitrary program and try to proof friendliness from scratch.

In a comment below, paulfchristiano makes the point that any theory of friendliness at all should give us such a proof system, for some restricted class of programs. For example, Eliezer envisions a theory about how to let programs evolve without losing friendliness. The corresponding class of proofs have the form "the program under consideration can be derived from the known-friendly program X by the sequence Y of friendliness-preserving transformations".