Scalar reward is not enough for aligned AGI
This post was authored by Peter Vamplew and Cameron Foale (Federation University), and Richard Dazeley (Deakin University) Introduction Recently some of the most well-known researchers in reinforcement learning Silver, Singh, Precup and Sutton published a paper entitled Reward is Enough, which proposes the reward-is-enough hypothesis: “Intelligence, and its associated abilities,...
I'm not suggesting that RL is the only, or even the best, way to develop AGI. But this is the approach being advocated by Silver et al, and given their standing in the research community, and the resources available to them at DeepMind, it would appear likely that they, and others, will probably try to develop AGI in this way.
Therefore I think it is essential that a multiobjective approach is taken for there to be any chance that this AGI will actually be aligned to our best interests. If conventional RL based on scalar reward is used then
(a) it is very difficult to specify a suitable scalar reward which accounts for all... (read more)