MugaSofer comments on I tried my hardest to win in an AI box experiment, and I failed. Here are the logs. - Less Wrong

6 [deleted] 27 January 2015 10:06PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (28)

You are viewing a single comment's thread. Show more comments above.

Comment author: Nepene 28 January 2015 04:47:02AM 3 points [-]

I've played the AI box game on other forums. We designed a system to incentivise release of the AI. We rolled randomly the ethics of the AI, rolled random events with dice and the AI offered various solutions to those problems. A certain number of accepted solutions would enable the AI to free itself. You lost points if you failed to deal with the problems and lost lots of points if you freed the AI and they happened to have goals you disagreed with like annihilation of everything.

Psychology was very important in those, as you said. Different people have very different values and to appeal to each person you have to know their values.

Comment author: ike 28 January 2015 04:52:00AM *  3 points [-]

So if you were trying to maximise total points, wouldn't it be best to never let it out because you lose a lot more if it destroys the world than you gain from getting solutions?

What values for points make it rational to let the AI out, and is it also rational in the real-world analogue?

Comment author: MugaSofer 29 January 2015 01:28:41PM -1 points [-]

We rolled randomly the ethics of the AI, rolled random events with dice and the AI offered various solutions to those problems... You lost points if you failed to deal with the problems and lost lots of points if you freed the AI and they happened to have goals you disagreed with like annihilation of everything.