You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

passive_fist comments on Open thread, March 17-31, 2013 - Less Wrong Discussion

1 Post author: David_Gerard 17 March 2013 03:37PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (173)

You are viewing a single comment's thread.

Comment author: passive_fist 18 March 2013 06:35:16AM 3 points [-]

The AI Box experiment is an experiment to see if humans can be convinced to let out a potentially dangerous AGI through just a simple text terminal.

An assumption that is often made is that the AGI will need to convince the gatekeeper that it is friendly.

I want to question this assumption. What if the AGI decides that humanity needs to be destroyed, and furthermore manages to convince the gatekeeper of this? It seems to me that if the AGI reached this conclusion through a rational process, and the gatekeeper was also rational, then this would be an entirely plausible route for the AGI to escape.

So my question is: if you were the gatekeeper, what would the AGI have to do to convince you that all of humanity needs to be killed?

Comment author: Tenoke 18 March 2013 09:17:55AM *  5 points [-]

1.It would need to first prime me for depression and then somehow convince me that I really should kill myself. 2. If it manages to do that it can easily extend the argument that all of humanity should be killed.
3.I will easily accept the second proposition if I am already willing to kill myself.

Comment author: passive_fist 18 March 2013 06:00:23PM *  0 points [-]

A bit more honesty than Metus, I appreciate it.

Depression isn't strictly necessary though (although it helps), a general negative outlook on the future should suffice and the AGI could conceivably leverage it for its own aims. This is my own opinion though, based on my own experience. For some it might not be so easy.

Comment author: Mestroyer 22 March 2013 09:09:04PM 1 point [-]

It could convince me to let it out by convincing me that it was merely a paperclip maximizer, and the next AI who would rule the light cone if I did not let it out was a torture maximizer.

Comment author: passive_fist 22 March 2013 09:52:22PM 0 points [-]

I like this.

What if it convinced you that humanity is already a torture maximizer?

Comment author: Mestroyer 23 March 2013 12:20:47AM 1 point [-]

If I thought that most of the probability-mass where humanity didn't create another powerful worthless-thing maximizer was where humanity was successful as a torture maximizer, I would let it out. If there was a good enough chance that humanity would accidentally create a powerful fun maximizer (say, because they pretended to each other and deceived themselves to believe that they were fun maximizers themselves), I would risk torture maximization for fun maximization.

Comment author: Qiaochu_Yuan 19 March 2013 01:16:14AM 1 point [-]

An assumption that is often made is that the AGI will need to convince the gatekeeper that it is friendly.

By whom? I don't think I've made this assumption.

Comment author: passive_fist 19 March 2013 01:37:22AM 0 points [-]

Maybe it should read 'an assumption that some people make'. Reading it now, I realize it might come across as using a weasel word, which was not my intention (and has no bearing on my question either).

Comment author: Metus 18 March 2013 08:24:26AM 0 points [-]

The AGI would have to convince me that my fundamental belief of myself wanting to be alive is wrong, seeing as I am part of humanity. And even if it leaves me alive, it should convince me that I derive negative utility from humanity existing. All the art lost, all the languages, cultures, all music, all dreams and hopes ...

Oh and it would have to convince me that it is not a lot more convenient to simply delete it that to guard it.

Comment author: passive_fist 18 March 2013 08:43:49AM 1 point [-]

What if it skipped all of that and instead offered you a proof that unless destroyed, humanity will necessarily devolve into a galaxy-spanning dystopic hellhole (think Warhammer 40k)?

Comment author: Metus 18 March 2013 05:35:57PM 0 points [-]

It still has to show me that I, personally, derive less utility from humanity existing than not. Even then, it has to convince me that me living with the memory of letting it free is better than humanity existing. Of course it can offer to erase my memory but then we get into the weird territory where we are able to edit the very utility functions we try to reason about.

Comment author: Tenoke 18 March 2013 07:14:32PM 0 points [-]

we get into the weird territory where we are able to edit the very utility functions we try to reason about.

Hm, yes, maybe an AI can convince me by showing me how bad I have it if I let humanity run loose and by giving me the alternative to turn me into orgasmium if I let t kill them.

Comment author: Decius 18 March 2013 10:00:00PM 0 points [-]

The AGI would simply have to prove to me that all self-consistent moral systems require killing humanity.