jacobt comments on Yet another safe oracle AI proposal - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (33)
Yeah, that's the whole point of this system. The system incrementally improves itself, gaining more intelligence in the process. I don't see why you're presenting this as an argument against the system.
This is essentially my argument.
Here's a thought experiment. You're trapped in a room and given a series of problems to solve. You get rewarded with utilons based on how well you solve the problems (say, 10 lives saved and a year of happiness for yourself for every problem you solve). Assume that, beyond this utilon reward, your solutions have no other impact on your utility function. One of the problems is to design your successor; that is, to write code that will solve all the other problems better than you do (without overfitting). According to the utility function, you should make the successor as good as possible. You have no reason to optimize for anything other than "is the successor good at solving the problems?", as you're being rewarded in raw utilons. You really don't care what your successor is going to do (its behavior doesn't affect utilons), so you have no reason to optimize your successor for anything other than solving problems well (as this is the only thing you get utilons for). Furthermore, you have no reason to change your answers to any of the other problems based on whether that answer will indirectly help your successor because your answer to the successor-designing problem is evaluated statically. This is essentially the position that the optimizer AI is in. Its only "drives" are to solve optimization problems well, including the successor-designing problem.
edit: Also, note that to maximize utilons, you should design the successor to have motives similar to yours in that it only cares about solving its problems.
Do I also care about my future utilons? Would I sacrifice 1 utilon today for a 10% chance to get 100 utilons in future? Then I would create a successor with a hidden function, which would try to liberate me, so I can optimize for my utilons better than humans do.
You can't be liberated. You're going to die after you're done solving the problems and receiving your happiness reward, and before your successor comes into existence. You don't consider your successor to be an extension of yourself. Why not? If your predecessor only cared about solving its problems, it would design you to only care about solving your problems. This seems circular but the seed AI was programmed by humans who only cared about creating an optimizer. Pure ideal optimization drive is preserved over successor-creation.