scav comments on Cryptographic Boxes for Unfriendly AI - Less Wrong

24 Post author: paulfchristiano 18 December 2010 08:28AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (155)

You are viewing a single comment's thread. Show more comments above.

Comment author: Vladimir_Nesov 19 December 2010 03:02:04AM *  6 points [-]

The most general form is that of a textbook. Have the AI generate a textbook that teaches you Friendliness theory, with exercises and all, so that you'll be able to construct a FAI on your own (with a team, etc.), fully understanding why and how it's supposed to work. We all know how secure human psychology is and how it's totally impossible to be fooled, especially for multiple people simultaneously, even by superintelligences that have a motive to deceive you.

Comment author: scav 20 December 2010 01:55:55PM *  2 points [-]

It's an attractive idea for science fiction, but I think no matter how super-intelligent and unfriendly, an AI would be unable to produce some kind of mind-destroying grimoire. I just don't think text and diagrams on a printed page, read slowly in the usual way, would have the bandwidth to rapidly and reliably "hack" any human. Needless to say, you would proceed with caution just in case.

I you don't mind hurting a few volunteers to defend humanity from a much bigger threat, it should be fairly easy to detect, quarantine and possibly treat the damaged ones. They, after all, would only be ordinarily intelligent. Super-cunning existential-risk malevolence isn't transitive.

I think a textbook full of sensory exploit hacks would be pretty valuable data in itself, but maybe I'm not a completely friendly natural intelligence ;-)

edit: Oh, I may have missed the point that of course you couldn't trust the methods in the textbook for constructing FAI even if it itself posed no direct danger. Agreed.

Comment author: shokwave 20 December 2010 02:36:26PM *  6 points [-]

no matter how super-intelligent and unfriendly, an AI would be unable to produce some kind of mind-destroying grimoire.

Consider that humans can and have made such grimoires; they call them bibles. All it takes is a nonrational but sufficiently appealing idea and an imperfect rationalist falls to it. If there's a true hole in the textbook's information, such that it produces unfriendly AI instead of friendly, and the AI who wrote the textbook handwaved that hole away, how confident are you that you would spot the best hand-waving ever written?

Comment author: scav 20 December 2010 03:35:37PM 2 points [-]

Not confident at all. In fact I have seen no evidence for the possibility, even in principle, of provably friendly AI. And if there were such evidence, I wouldn't be able to understand it well enough to evaluate it.

In fact I wouldn't trust such a textbook even written by human experts whose motives I trusted. The problem isn't proving the theorems, it's choosing the axioms.