Stuart_Armstrong comments on Siren worlds and the perils of over-optimised search - Less Wrong

27 Post author: Stuart_Armstrong 07 April 2014 11:00AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (411)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 30 April 2014 03:26:13PM 0 points [-]

Under what circumstances? There are situations - torture, seduction, a particular way of asking the question - that can make any brain give any answer. Defining "non-coercive yet informative questioning" about a piece of software (a simulated brain) is... hard. AI hard, as some people phrase it.

Comment author: TheAncientGeek 30 April 2014 04:23:18PM *  2 points [-]

Why would that .be more of a problem for an AI than a human?

Comment author: Stuart_Armstrong 02 May 2014 01:38:54PM 0 points [-]

? The point is that having a simulated brain and saying "do what this brain approves of" does not make the AI safe, as defining the circumstance in which the approval is acceptable is a hard problem.

This is a problem for us controlling an AI, not a problem for the AI.

Comment author: TheAncientGeek 02 May 2014 03:27:26PM 0 points [-]

I still don't get it. We assume acceptability by default. We don't constantly stop and ask "Was that extracted under torture".

Comment author: Stuart_Armstrong 06 May 2014 11:47:11AM 0 points [-]

I do not understand your question. It was suggested that an AI run a simulated brain, and ask the brain for approval for doing its action. My point was that "ask the brain for approval" is a complicated thing to define, and puts no real limits on what the AI can do unless we define it properly.

Comment author: TheAncientGeek 06 May 2014 12:42:23PM 0 points [-]

Ok. You are assuming the superintelligent .AI will pose the question in a dumb way?

Comment author: Stuart_Armstrong 06 May 2014 12:46:19PM 0 points [-]

No, I am assuming the superintelligent AI will pose the question in the way it will get the answer it prefers to get.

Comment author: TheAncientGeek 06 May 2014 01:20:24PM 0 points [-]

Oh, you're assuming it's malicious. In order to prove...?

Comment author: Stuart_Armstrong 06 May 2014 05:57:19PM 1 point [-]

No, not assuming it's malicious.

I'm assuming that it has some sort of programming along the lines of "optimise X, subject to the constraint that uploaded brain B must approve your decisions."

Then it will use the most twisted definition of "approve" that it can find, in order to best optimise X.

Comment author: TheAncientGeek 07 May 2014 11:01:55AM *  -1 points [-]

The programme it with:

Prime directive - interpret all directives according to your makers intentions.

Secondary directive - do nothing that goes against the uploaded brain

Tertiary objective - optimise X.