scrdest

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

Newest

Goodhart's Law in Reinforcement Learning

scrdest2y20

This seems to me like a formalisation of Scott Alexander's The Tails Coming Apart As Metaphor For Life post.

Given a function and its approximation, following the approximate gradient in Mediocristan is good enough, but the extremes are highly dissimilar.

I wonder what impact complex reward functions have. If you have a pair of approximate rewards, added together, could they pull the system closer to the real target by cancelling each other out?

Reply