It's true that the probability of a microstate is determined by energy and temperature, but the Maxwell-Boltzmann equation assumes that temperature is constant for all particles. Temperature is a distinguishing feature of two distributions, not of two particles within a distribution, and least-temperature is not a state that systems tend towards.
As an aside, the canonical ensemble that the Maxwell-Boltzmann distribution assumes is only applicable when a given state is exceedingly unlikely to be occupied by multiple particles. The strange behavior of condensed matter that I think you're referring to (Bose-Einstein condensates) is a consequence of this assumption being incorrect for bosons, where a stars-and-bars model is more appropriate.
It is not true that information theory requires the conservation of information. The Ising Model, for example, allows for particle systems with cycles of non-unity gain. This effectively means that it allows particles to act as amplifiers (or dampeners) of information, which is a clear violation of information conservation. This is the basis of critical phenomena, which is a widely accepted area of study within statistical mechanics.
I think you misunderstand how models are fit in practice. It is not standard practice to determine the absolute information content of input, then to relay that information to various explanators. The information content of input is determined relative to explanators. However, there are training methods that attempt to reduce the relative information transferred to explanators, and this practice is called regularization. The penalty-per-relative-bit approach is taken by a method called "dropout", where a random "cold" model is trained on each training sample, and the final model is a "heated" aggregate of the cold models. "Heating" here just means cutting the amount of information transferred from input to explanator by some fraction.
It's true that the probability of a microstate is determined by energy and temperature, but the Maxwell-Boltzmann equation assumes that temperature is constant for all particles. Temperature is a distinguishing feature of two distributions, not of two particles within a distribution, and least-temperature is not a state that systems tend towards.
I know. Models are not particles. They are distributions over outcomes. They CAN be the trivial distributions over outcomes (X will happen).
I was not referring to either form of degenerate gas in any of my posts...
A putative new idea for AI control; index here.
Noise versus preference and complexity
Error versus bias versus preference
Preference versus prejudice (and bias)
Known prejudices
Revisiting complexity