Gabriel comments on Hacking the CEV for Fun and Profit - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (194)
If I understand Houshalter correctly, then his idea can be presented using the following story:
Suppose you worked out the theory of building self-improving AGIs with stable goal systems. The only problem left now is to devise an actual goal system that will represent what is best for humanity. So you spend the next several years engaged in deep moral reflection and finally come up with the perfect implementation of CEV completely impervious to the tricks of Dr. Evil and his ilk.
However, morality upon which you have reflected for all those years isn't an external force accessible only to humans. It is a computation embedded in your brain. Whatever you ended up doing was the result of your brain-state at the beginning of the story and stimuli that have affected you since that point. All of this could have been simulated by a Sufficiently Smart™ AGI.
So the idea is: instead of spending those years coming up with the best goal system for your AGI, simply run it and tell it to simulate a counterfactual world in which you did and then do what you would have done. Whatever will result from that, you couldn't have done better anyway.
Of course, this is all under the assumption that formalizing Coherent Extrapolated Volition is much more difficult than formalizing My Very Own Extrapolated Volition (for any given value of me).