Manfred comments on MIRI's Approach - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (59)
I don't know about this, but would be happy to hear more.
I don't think the point is "controlling" these properties, I think the point is drawing conclusions about what an AI will do in the real world. Reduced speed might allow us to run "fast AIs" in simulation and draw conclusions about what they'll do. Reduced speed might also let us run AI civilizations of large size (though it's not obvious to me why you'd want such a thing) and draw conclusions about what they'll do. Reducing the AI's knowledge seems like a way to make a simulation more compuationally tractable and therefore get better predictions about what the AI will do - but it seems like a risky way that can introduce bias into a simulation.
My real problem is that I don't think just testing for altruism (which I assume means altruistic behavior) is remotely good enough. If we could simulate our world out past an AI becoming more powerful than the human race, and select for altruism then, I'd be happy. But I am pretty confident that there will be big problems generalizing from a simulation to reality, if that simulation has both differences and restrictions on possible actions and possible values.
If we're just testing a self-driving car, we can make a simulation that captures the available actions (both literal outputs and "effective actions" permitted by the dynamics) and has basically the right value function built in from the start. Additionally, self-driving cars generalize well from the model to reality. Suppose you have something unrealistic in the model (say, you have other cars follow set training trajectories rather than reacting to the actions of the car). A realistic self-driving car that does well in the simulation might be bad at some skills like negotiating for space on the road, but it won't suddenly, say, try to use its tire tracks to spell out letters if it you put it into reality with humans.
To put what I think concretely, when exposed to a difference between training and reality, a "dumb, parametric AI" projects reality onto the space it learned in training and just keeps on plugging, making it somewhat insensitive to reality being complicated, and giving us a better idea about how it will generalize. But a "smart AI" doesn't seem to have this property, it will learn the complications of reality that were omitted in testing, and can act very different as a result. This goes back to the problem of expanding sets of effective actions.