You assign a probability of a microstate according to its energy and the temperature. The density of states at various temperatures creates very nontrivial behavior (especially in solid-state systems).
You appear to know somewhat more about fitting than I do - as I understood it, you assign a probability of a specific model according to its information content and the 'temperature'. The information content would be, if your model is a curvefit with four parameters, all of which are held to a narrow range, that has more 1/3 information than a fit with three parameters held to a similar range.
In pure information theory, the information requirement is exactly steady with the density of states. One bit per bit, no matter what. If you're just picking out maximum entropy, then you don't need to refer to a temperature.
I was thinking about a penalty-per-bit that is higher than 1/2 - a stronger preference for smaller models than breaking-even. Absolute Zero would be when you don't care about the evidence, you're going with a 0 bit model.
It's true that the probability of a microstate is determined by energy and temperature, but the Maxwell-Boltzmann equation assumes that temperature is constant for all particles. Temperature is a distinguishing feature of two distributions, not of two particles within a distribution, and least-temperature is not a state that systems tend towards.
As an aside, the canonical ensemble that the Maxwell-Boltzmann distribution assumes is only applicable when a given state is exceedingly unlikely to be occupied by multiple particles. The strange behavior of conden...
A putative new idea for AI control; index here.
Noise versus preference and complexity
Error versus bias versus preference
Preference versus prejudice (and bias)
Known prejudices
Revisiting complexity