anotheruser comments on On the fragility of values - Less Wrong

4 Post author: Stuart_Armstrong 04 November 2011 06:15PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (31)

You are viewing a single comment's thread. Show more comments above.

Comment author: anotheruser 05 November 2011 09:18:38AM 0 points [-]

Of course it won't be easy. But if the AI doesn't understand that question you already have confirmation that this thing should definitely not be released. An AI can only be safe for humans if it understands human psychology. Otherwise it is bound to treat us a black boxes and that can only have horrible results, regardless of how sophisticated you think you made its utility function.

I agree that the question doesn't actually make a lot of sense to humans, but that shouldn't stop an intelligent entity from trying to make the best of it. When you are given an impossible task, you don't despair but make a compromise and try to fullfill the task as best you can. When humans found out that entropy always increases and humanity will die out someday, no matter what, we didn't despair either, even though evolution has made it so that we desire to have offspring and for that offspring to do the same, indefinitely.

Comment author: JoshuaZ 05 November 2011 04:37:47PM 1 point [-]

But if the AI doesn't understand that question you already have confirmation that this thing should definitely not be released.

How likely is it that we'll be able to see that it doesn't understand as opposed to it reporting that it understands when it really doesn't?

Comment author: anotheruser 05 November 2011 05:12:28PM 0 points [-]

You will obviously have to test its understanding of psychology with some simple examples first.

Comment author: lessdazed 06 November 2011 02:08:08PM 0 points [-]
Comment author: anotheruser 06 November 2011 07:14:22PM 0 points [-]

Are you really trying to tell me that you think researchers would be unable to take that into account when tying to figure out whether or not an AI understands psychology?

Of course you will have to try to find problems where the AI can't predict how humans would feel. That is the whole point of testing, after all. Suggesting that someone in a position to teach psychology to an AI would make such a basic mistake is frankly insulting.

I probably shouldn't have said "simple examples". What you should actually test are examples of gradually increasing difficulty to find the ceiling of human understanding the AI possesses. You will also have to look for contingencies or abnormal cases that the AI probably wouldn't learn about otherwise.

The main idea is simply that an understanding of human psychology is both teachable and testable. How exactly this could be done is a bridge we can cross when we come to it.

Comment author: lessdazed 06 November 2011 07:58:34PM 0 points [-]

I think you really, really want a proof rather than a test. One can only test a few things, and agreement on all of those is not too informative. I should have included this link, which is several times as important as the previous one, and they combine to make my point.

Comment author: anotheruser 06 November 2011 10:22:44PM 0 points [-]

I never claimed that a strict proof is possible, but I do believe that you can become reasonably certain that an AI understands human psychology.

Give the thing a college education in psychology, ethics and philosophy. Ask its opinion on famous philosophical problems. Show it video clips or abstract scenarios about everyday life and ask what it thinks why the people did what they did. Then ask what it would have done in the same situation and if it says it would act differently, ask it why and what it thinks is the difference in motivation between it and the human.

Finally, give it all stories that were ever written about malevolent AIs or paperclip maximizers to read and tell it to comment on that.

Let it write a 1000 page thesis on the dangers of AI.

If do all that you are bound to find any significant misunderstanding.