This seems like a time to bring up information temperature. After all, there is the deep parallel of entropy in information theory and physics. When comparing models, by what factor do you penalize a model for requiring more information to specify it? That would be analogous to the inverse temperature. I have yet to encounter a case where it makes sense in information theory, though.
Also, another explanation of the extra +1 is that the risk of having to use a -2 doesn't seem that scary - it is not a very strong preference. If the penalty for a -2 was 10 while 1, 0, or 1 was 1, then as long as the probability of needing to hit -2 to stay on the station is less than 11% and it saves a turn, going for the extra +1 seems like a good move. If the penalty is smaller - 4, say - then even a fatter risk seems reasonable.
How is inverse temperature a penalty on models? If you're referring to the inverse temperature in the Maxwell-Boltzmann distribution, the temperature is considered a constant, and it gives the likelihood of a particle having a particular configuration, not the likelihood of a distribution.
Also, I'm not sure it's clear what you mean by "information to specify [a model]". Does a high inverse temperature mean a model requires more information, because it's more sensitive to small changes and therefore derives more information from them, or does it m...
A putative new idea for AI control; index here.
Noise versus preference and complexity
Error versus bias versus preference
Preference versus prejudice (and bias)
Known prejudices
Revisiting complexity