You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

wedrifid comments on Schneier talks about The Dishonest Minority [Link] - Less Wrong Discussion

6 Post author: Nic_Smith 10 May 2011 05:27AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (17)

You are viewing a single comment's thread. Show more comments above.

Comment author: wedrifid 11 May 2011 05:43:32PM -1 points [-]

Given 30 seconds thought I can not think of a way to do this.

Although it turns out in 35 seconds I can. It requires the humans to have already solved friendliness and provable stability under self modification. The solution would need to be implemented in an automated system that can output a result and self destruct. Unfortunately for you the hard part of creating an FAI is already done.

Comment author: TimFreeman 12 May 2011 02:51:36AM 0 points [-]

I gather your point is that you get a FAI to check out Clippy, and give a go/no-go decision, and then destroy itself. Not much point in doing that, you could just run the FAI and ignore Clippy, and someone has to check that the FAI is in fact Friendly.

Comment author: wedrifid 12 May 2011 04:46:59AM *  1 point [-]

I gather your point is that you get a FAI to check out Clippy, and give a go/no-go decision, and then destroy itself. Not much point in doing that, you could just run the FAI and ignore Clippy, and someone has to check that the FAI is in fact Friendly.

No, that which is required to verify friendliness is less than an FAI. As I said earlier, what is probably the hard part is already done so the circumstance in which it is worth using Clippy rather than finishing off a goal-stable self improving AGI with Friendliness is unlikely. Nevertheless it exists, particularly if the implementation of the AGI is harder than I expect.

Comment author: TimFreeman 13 May 2011 02:14:53AM *  1 point [-]

No, that which is required to verify friendliness is less than an FAI.

Do you have a pointer to a proposed procedure for that?

I'd expect implementing Friendliness to be easier than verifying Friendliness, since just about every interesting function of Turing machines is equivalent to the halting problem, and verifying Friendliness is an interesting function of a Turing machine. If you put heavy constraints on how Clippy's code is structured, you might be able to verify Friendliness, but you didn't mention that and Clippy didn't offer to do that.

Comment author: wedrifid 13 May 2011 05:08:43AM 0 points [-]

I'd expect implementing Friendliness to be easier than verifying Friendliness,

I'd rather like to verify that my AGI would be friendly before I run it. :) (Usually the label FAI seems to refer to AIs which will be 'provably friendly'.)

Comment author: TimFreeman 13 May 2011 02:13:05PM *  0 points [-]

You might be able to verify interesting properties of code that you constructed for the purpose of making verification possible, but you aren't likely to be able to verify interesting properties of arbitrary hostile code like Clippy would have an incentive to produce.

You passed up an opportunity to point to your proposed verification procedure, so at this point I assume you don't have one. Please prove me wrong.

Usually the label FAI seems to refer to AIs which will be 'provably friendly'.

I don't even know what the exact theorem to prove would be. Do you?