Interpreting a Maze-Solving Network
Mechanistic interpretability on a pretrained policy network from Goal Misgeneralization in Deep Reinforcement Learning.
Mechanistic interpretability on a pretrained policy network from Goal Misgeneralization in Deep Reinforcement Learning.