wedrifid comments on Cryptographic Boxes for Unfriendly AI - Less Wrong

24 Post author: paulfchristiano 18 December 2010 08:28AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (155)

You are viewing a single comment's thread. Show more comments above.

Comment author: D_Alex 20 December 2010 05:35:07AM 2 points [-]

The probability I assign to achieving a capability state where it is (1) possible to prove a mind Friendly even if it has been constructed by a hostile superintelligence, (2) possible to build a hostile superintelligence, and (3) not possible to build a Friendly AI directly, is very low.

Does this still hold if you remove the word "hostile", i. e. if the "friendliness" of the superintelligence you construct first is simply not known?

"It is hard to explain something to someone whose job depends on not understanding it."

This quote applies to you and your approach to AI boxing ; )

AI Boxing is a potentially useful approach, if one accepts that:

a) AGI is easier than FAI b) Verification of "proof of friendliness" is easier than its production c) AI Boxing is possible

As far as I can tell, you agree with a) and b). Please take care that your views on c) are not clouded by the status you have invested in the AI Box Experiment ... of 8 years ago.

"Human understanding progresses through small problems solved conclusively, once and forever" - cousin_it, on LessWrong.

Comment author: wedrifid 20 December 2010 05:53:08AM 0 points [-]

a) AGI is easier than FAI b) Verification of "proof of friendliness" is easier than its production c) AI Boxing is possible

As far as I can tell, you agree with a) and b).

The comment of Eliezer's does not seem to be mentioning the obvious difficulties with c) at all. In fact in the very part you choose to quote...

The probability I assign to achieving a capability state where it is (1) possible to prove a mind Friendly even if it has been constructed by a hostile superintelligence, (2) possible to build a hostile superintelligence, and (3) not possible to build a Friendly AI directly, is very low.

... it is b) that is implicitly the weakest link, with some potential deprecation of a) as well. c) is outright excluded from hypothetical consideration.