Stuart_Armstrong comments on Siren worlds and the perils of over-optimised search - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (411)
I do not understand your question. It was suggested that an AI run a simulated brain, and ask the brain for approval for doing its action. My point was that "ask the brain for approval" is a complicated thing to define, and puts no real limits on what the AI can do unless we define it properly.
Ok. You are assuming the superintelligent .AI will pose the question in a dumb way?
No, I am assuming the superintelligent AI will pose the question in the way it will get the answer it prefers to get.
Oh, you're assuming it's malicious. In order to prove...?
No, not assuming it's malicious.
I'm assuming that it has some sort of programming along the lines of "optimise X, subject to the constraint that uploaded brain B must approve your decisions."
Then it will use the most twisted definition of "approve" that it can find, in order to best optimise X.
The programme it with:
Prime directive - interpret all directives according to your makers intentions.
Secondary directive - do nothing that goes against the uploaded brain
Tertiary objective - optimise X.
And how do you propose to code the prime directive? (with that, you have no need for the other ones; the uploaded brain is completely pointless)
The prime directive is the tertiary directive for a specific X
That's not a coding approach for the prime directive.
You have already assumed you can build an .AI that optimises X. I am not assuming anything different.
In fact any .AI that self improves is going to have to have some sort of goal of getting things right, whether instrumental or terminal. Terminal is much safer, to the extent that it might even solve the whole friendliness problem.