jacobt comments on Yet another safe oracle AI proposal - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (33)
The purpose of this system is to give you a way to do science and cure disease without making human-level AI that has a utility function/drives related to the external world.
Yes, it does. I'm assuming what you mean is that it will use something similar to genetic algorithms or hill climbing to find solutions; that is, it comes up with one solution, then looks for similar ones that have higher scores. I think this will be safe because it's still not doing anything long-term. All this local search finds an immediate solution. There's no benefit to be gained by returning, say, a software program that hacks into computers and runs the optimizer on all of them. In other words, the "utility function" emphasizes current ability to solve optimization problems above all else.
If such a system were around, it would be straightforward to create a human-level AI that has a utility function--just ask the optimizer to build a good approximate model for its observations in the real world, and then ask the optimizer to come up with a good plan for achieving some goal with respect to that model. Cutting humans out of the loop seems to radically increase the effectiveness of the system (are you disagreeing with that?) so the situation is only stable insofar as a very safety-aware project maintains a monopoly on the technology. (The amount of time they need to maintain a monopoly depends on how quickly they are able to build a singleton with this technology, or build up infrastructure to weather less cautious projects.)
There are two obvious ways this fails. One is that partially self-directed hill-climbing can do many odd and unpredictable things, as in human evolution. Another is that there is a benefit to be gained by building an AI that has a good model for mathematics, available computational resources, other programs it instantiates, and so on. It seems to be easier to give general purpose modeling and goal-orientation, then to hack in a bunch of particular behaviors (especially if you are penalizing for complexity). The "explicit" self-modification step in your scheme will probably not be used (in worlds where takeoff is possible); instead the system will just directly produce a self-improving optimizer early on.