The aim of this post is illustrating the need to take into account decision-making and incentive considerations when designing agents. This post is also a proof that these considerations are important in order to ensure the safety of agents. Also, we will postulate that there exist some agents that are both robust to changing or having their reward function changed, although that will need a careful approach to incentive design and decision theory choosing.
The first agent we will consider is a (current Reward Function, Time Inconsistent aware, see in the second half of the post if you don't know what this means) agent that uses Causal Decision Theory (CDT). A review of... (read 956 more words →)