I'm still a bit confused by the difference between inner alignment and out-of-distribution generalization. What's the fundamental difference between the cat-classifying problem and the maze problem. The model itself is an optimizer for the latter? But why this is any special?
What if the neural network used to solve the maze problem just learns a mapping (but doesn't do any search)? Is that still an inner-alignment problem?
I'm still a bit confused by the difference between inner alignment and out-of-distribution generalization. What's the fundamental difference between the cat-classifying problem and the maze problem. The model itself is an optimizer for the latter? But why this is any special?
What if the neural network used to solve the maze problem just learns a mapping (but doesn't do any search)? Is that still an inner-alignment problem?