simon comments on Siren worlds and the perils of over-optimised search - Less Wrong

27 Post author: Stuart_Armstrong 07 April 2014 11:00AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (411)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 06 May 2014 12:46:19PM 0 points [-]

No, I am assuming the superintelligent AI will pose the question in the way it will get the answer it prefers to get.

Comment author: TheAncientGeek 06 May 2014 01:20:24PM 0 points [-]

Oh, you're assuming it's malicious. In order to prove...?

Comment author: Stuart_Armstrong 06 May 2014 05:57:19PM 1 point [-]

No, not assuming it's malicious.

I'm assuming that it has some sort of programming along the lines of "optimise X, subject to the constraint that uploaded brain B must approve your decisions."

Then it will use the most twisted definition of "approve" that it can find, in order to best optimise X.

Comment author: TheAncientGeek 07 May 2014 11:01:55AM *  -1 points [-]

The programme it with:

Prime directive - interpret all directives according to your makers intentions.

Secondary directive - do nothing that goes against the uploaded brain

Tertiary objective - optimise X.

Comment author: Stuart_Armstrong 07 May 2014 12:07:43PM 0 points [-]

And how do you propose to code the prime directive? (with that, you have no need for the other ones; the uploaded brain is completely pointless)

Comment author: TheAncientGeek 07 May 2014 01:00:55PM 0 points [-]

The prime directive is the tertiary directive for a specific X

Comment author: Stuart_Armstrong 07 May 2014 02:58:44PM 0 points [-]

That's not a coding approach for the prime directive.

Comment author: TheAncientGeek 07 May 2014 03:13:15PM *  0 points [-]

You have already assumed you can build an .AI that optimises X. I am not assuming anything different.

In fact any .AI that self improves is going to have to have some sort of goal of getting things right, whether instrumental or terminal. Terminal is much safer, to the extent that it might even solve the whole friendliness problem.

Comment author: Stuart_Armstrong 07 May 2014 03:14:45PM 1 point [-]

You have already assumed you can build an .AI that optimists X. Iam no assuming anything different.

No, you are assuming that we can build an AI that optimises a specific thing, "interpret all directives according to your makers intentions". I'm assuming that we can build an AI that can optimise something, which is very different.

Comment author: XiXiDu 07 May 2014 04:23:32PM 1 point [-]

No, you are assuming that we can build an AI that optimises a specific thing, "interpret all directives according to your makers intentions". I'm assuming that we can build an AI that can optimise something, which is very different.

An AI that can self-improve considerably does already interpret a vast amount of directives according to its makers intentions, since self-improvement is an intentional feature.

Being able to predict a programs behavior is a prerequisite if you want the program to work well. Since unpredictable behavior tends to be chaotic and detrimental to the overall performance. In other words, if you got an AI that does not work according to its makers intentions, then you got an AI that does not work, or which is not very powerful.

Comment author: TheAncientGeek 07 May 2014 03:21:00PM 0 points [-]

So your saying the orthogonality thesis is false?