This seems to me like a formalisation of Scott Alexander's The Tails Coming Apart As Metaphor For Life post.
Given a function and its approximation, following the approximate gradient in Mediocristan is good enough, but the extremes are highly dissimilar.
I wonder what impact complex reward functions have. If you have a pair of approximate rewards, added together, could they pull the system closer to the real target by cancelling each other out?
This seems to me like a formalisation of Scott Alexander's The Tails Coming Apart As Metaphor For Life post.
Given a function and its approximation, following the approximate gradient in Mediocristan is good enough, but the extremes are highly dissimilar.
I wonder what impact complex reward functions have. If you have a pair of approximate rewards, added together, could they pull the system closer to the real target by cancelling each other out?