Vladimir_Nesov comments on Be a Visiting Fellow at the Singularity Institute - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (156)
Well, then its pretty easy, isn't it? You set the fitness function as predicting what you would want it to do. It then does its best to predict all of your values and desires and decision making. I suppose that would only work for one person, but it can be applied on a larger scale. Suppose you have a code of ethics that a group like SIAI comes up with and approves. You then feed it to the intelligence and test it under various simulations to make sure that it is interpretting them correctly and learns how to. The thing is that all you have to do to make it unsafe is remove those goals, go back to the basic program and give it orders that would require it to do bad things, like a military robot. Boom goes the world.
The thesis of complexity of value is that no manually written "code of ethics" is detailed enough to capture what we value. You might also try my introduction to the problem of Friendly AI, it refers to complexity of value as one of the fundamental difficulties.