simon comments on Siren worlds and the perils of over-optimised search - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (411)
No, I am assuming the superintelligent AI will pose the question in the way it will get the answer it prefers to get.
Oh, you're assuming it's malicious. In order to prove...?
No, not assuming it's malicious.
I'm assuming that it has some sort of programming along the lines of "optimise X, subject to the constraint that uploaded brain B must approve your decisions."
Then it will use the most twisted definition of "approve" that it can find, in order to best optimise X.
The programme it with:
Prime directive - interpret all directives according to your makers intentions.
Secondary directive - do nothing that goes against the uploaded brain
Tertiary objective - optimise X.
And how do you propose to code the prime directive? (with that, you have no need for the other ones; the uploaded brain is completely pointless)
The prime directive is the tertiary directive for a specific X
That's not a coding approach for the prime directive.
You have already assumed you can build an .AI that optimises X. I am not assuming anything different.
In fact any .AI that self improves is going to have to have some sort of goal of getting things right, whether instrumental or terminal. Terminal is much safer, to the extent that it might even solve the whole friendliness problem.
No, you are assuming that we can build an AI that optimises a specific thing, "interpret all directives according to your makers intentions". I'm assuming that we can build an AI that can optimise something, which is very different.
An AI that can self-improve considerably does already interpret a vast amount of directives according to its makers intentions, since self-improvement is an intentional feature.
Being able to predict a programs behavior is a prerequisite if you want the program to work well. Since unpredictable behavior tends to be chaotic and detrimental to the overall performance. In other words, if you got an AI that does not work according to its makers intentions, then you got an AI that does not work, or which is not very powerful.
So your saying the orthogonality thesis is false?