Sewing-Machine comments on Cryptographic Boxes for Unfriendly AI - Less Wrong

24 Post author: paulfchristiano 18 December 2010 08:28AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (155)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 18 December 2010 02:44:08PM *  19 points [-]

a certifiably friendly AI: a class of optimization processes whose behavior we can automatically verify will be friendly

The probability I assign to achieving a capability state where it is (1) possible to prove a mind Friendly even if it has been constructed by a hostile superintelligence, (2) possible to build a hostile superintelligence, and (3) not possible to build a Friendly AI directly, is very low.

In particular the sort of proof techniques I currently have in mind - what they prove and what it means - for ensuring Friendliness through a billion self-modifications of something that started out relatively stupid and built by relatively trusted humans, would not work for verifying Friendliness of a finished AI that was handed you by a hostile superintelligence, and it seems to me that the required proof techniques for that would have to be considerably stronger.

To paraphrase Mayor Daley, the proof techniques are there to preserve the Friendly intent of the programmers through the process of constructing the AI and through the AI's self-modification, not to create Friendliness. People hear the word "prove" and assume that this is because you don't trust the programmers, or because you have a psychological need for unobtainable absolute certainty. No, it's because if you don't prove certain things (and have the AI prove certain things before each self-modification) then you can't build a Friendly AI no matter how good your intentions are. The good intentions of the programmers are still necessary, and assumed, beyond the parts that are proved; and doing the proof work doesn't make the whole process absolutely certain, but if you don't strengthen certain parts of the process using logical proof then you are guaranteed to fail. (This failure is knowable to a competent AGI scientist - not with absolute certainty, but with high probability - and therefore it is something of which a number of would-be dabblers in AGI maintain a careful ignorance regardless of how you try to explain it to them, because the techniques that make them enthusiastic don't support that sort of proof. "It is hard to explain something to someone whose job depends on not understanding it.")

Comment author: [deleted] 18 December 2010 03:16:38PM 6 points [-]

The probability I assign to achieving a capability state where it is (1) possible to prove a mind Friendly even if it has been constructed by a hostile superintelligence, (2) possible to build a hostile superintelligence, and (3) not possible to build a Friendly AI directly, is very low.

A general theory of quarantines would nevertheless be useful.

Comment author: Vladimir_Nesov 18 December 2010 03:25:06PM *  4 points [-]

Moral value can manipulate your concerns, even as you prevent causal influence. Maybe the AI will create extraordinary people in its mind, and use that as leverage to work on weak points of your defense. It's just too difficult, you are bound to miss something. The winning move is not to play.

Comment author: paulfchristiano 18 December 2010 06:46:08PM 11 points [-]

I disagree. The weak point of the scheme is the friendliness test, not the quarantine. If I prove the quarantine scheme will work, then it will work unless my computational assumptions are incorrect. If I prove it will work without assumptions, it will work without assumptions.

If you think that an AI can manipulate our moral values without ever getting to say anything to us, then that is a different story. This danger occurs even before putting an AI in a box though, and in fact even before the design of AI becomes possible. This scheme does nothing to exacerbate that danger.

Comment author: wedrifid 20 December 2010 06:07:31AM *  0 points [-]

If you think that an AI can manipulate our moral values without ever getting to say anything to us, then that is a different story.

With a few seconds of thought it is easy to see how this is possible even without caring about imaginary people. This is a question of cooperation among humans.

This danger occurs even before putting an AI in a box though, and in fact even before the design of AI becomes possible. This scheme does nothing to exacerbate that danger.

This is a good point too, although I may not go as far as saying it does nothing to exacerbate the danger. The increased tangibility matters.

Comment author: paulfchristiano 20 December 2010 06:02:40PM 0 points [-]

I think that running an AI in this way is no worse than simply having the code of an AGI exist. I agree that just having the code sitting around is probably dangerous.

Comment author: wedrifid 21 December 2010 03:51:21AM 0 points [-]

Nod, in terms of direct danger the two cases aren't much different. The difference in risk is only due to the psychological impact on our fellow humans. The Pascal's Commons becomes that much more salient to them. (Yes, I did just make that term up. The implications of the combination are clear I hope.)

Comment author: [deleted] 18 December 2010 06:21:21PM 4 points [-]

Separate "let's develop a theory of quarantines" from "let's implement some quarantines."

It's just too difficult, you are bound to miss something.

Christiano should take it as a compliment that his idea is formal enough that one could imagine proving that it doesn't work. Other than that, I don't see why your remark should go for "quarantining an AI using cryptography" and not "creating a friendly AI."

The winning move is not to play.

Prove it. Prove it by developing a theory of quarantines.

Comment author: Vladimir_Nesov 18 December 2010 07:22:39PM 1 point [-]

Separate "let's develop a theory of quarantines" from "let's implement some quarantines."

I agree.

Comment author: NihilCredo 18 December 2010 03:36:59PM *  1 point [-]

Sociopathic guardians woud solve that one particular problem (and bring others, of course, but perhaps more easily countered).

Comment author: Vladimir_Nesov 18 December 2010 04:16:35PM 4 points [-]

You are parrying my example, but not the pattern it exemplifies (not speaking of the larger pattern of the point I'm arguing for). If certain people are insensitive to this particular kind of moral arguments, they are still bound to be sensitive to some moral arguments. Maybe the AI will generate recipes for extraordinarily tasty foods for your sociopaths or get-rich-fast schemes that actually work or magically beautiful music.

Comment author: NihilCredo 18 December 2010 04:32:35PM *  1 point [-]

Indeed. The more thorough solution would seem to be "find a guardian possessing such an utility function that the AI has nothing to offer them that you can't trump with a counter-offer". The existence of such guardians would depend on the upper estimations of the AI's capabilities and on their employer's means, and would be subject to your ability to correctly assess a candidate's utility function.

Comment author: timtyler 19 December 2010 07:29:30PM 0 points [-]

The winning move is not to play.

Very rarely is the winning move not to play.

It seems especially unlikely to be the case if you are trying to build a prison.

Comment author: Eliezer_Yudkowsky 18 December 2010 04:11:09PM 2 points [-]

For what?

Comment author: PeterS 18 December 2010 08:48:34PM 4 points [-]

The OP framed the scenario in terms of directing the AI to design a FAI, but the technique is more general. It's possibly safe for all problems with a verifiable solution.

Comment author: wedrifid 20 December 2010 05:59:32AM *  2 points [-]

For what?

People I don't trust but don't want to kill (or modify to cripple). A non-compliant transhuman with self modification ability may not be able to out-compete an FAI but if it is not quarantined it could force the FAI to burn resources to maintain dominance.

But it is something we can let the FAI build for us.

Comment deleted 20 December 2010 10:22:57AM [-]
Comment author: wedrifid 20 December 2010 11:14:24AM 0 points [-]

Shrug. For the purposes here they could be called froogles for all I care. The quarantine could occur in either stage depending on the preferences being implemented.