1) any non-trivial Density of States, especially for semiconductors for the van Hove singularities.
2) I don't mean a model like 'consider an FCC lattice populated by one of 10 types of atoms. Here are the transition rates...' such that the model is made of microstates and you need to do statistics to get probabilities out. I mean a model more like 'Each cigarette smoked increases the annual risk of lung cancer by 0.001%' so the output is simply a distribution over outcomes, naturally (these include the others as special cases)
In particular, I'm working under the toy meta-model that models are programs that output a probability distribution over bitstreams; these are their predictions. You measure reality (producing some actual bitstream) and adjust the probability of each of the models according to the probability they gave for that bitstream, using Bayes' theorem.
3) I may have misused the term. I mean, the cost in entropy to produce that precise bit-stream. Starting from a random bitstream, how many measurements do you have to use to turn it into, say, 1011011100101 with xor operations? One for each bit. Doesn't matter how many bits there are - you need to measure them all.
When you consider multiple models, you weight them as a function of their information, preferring shorter ones. A.k.a. Occam's razor. Normally, you reduce the probability by 1/2 for each bit required. Pprior(model) ~ 2^-N, and you sum only up to the number of bits of evidence you have. This last clause is a bit of a hack to keep it normalizable (see below)
I drew a comparison of this to temperature, where you have a probability penalty of e^-E/kT on each microstate. You can have any value here because the number of microstates per energy range (the density of states) does not increase exponentially, but usually quadratically, or sometimes less (over short energy ranges, sometimes it is more).
If you follow the analogy back, the number of bitstreams does increase exponentially as a function of length (doubles each bit), so the prior probability penalty for length must be at least as strong as 1/2 to avoid infinitely-long programs being preferred. But, you can use a stronger exponential dieoff - let's say, 2.01^(-N) - and suddenly the distribution is already normalizable with no need for a special hack. What particular value you put in there will be your e^1/kT equivalent in the analogy.
2) I think this is the distinction you are trying to make between the lattice model and the smoker model: in the lattice model, the equations and parameters are defined, whereas in the smoker model, the equations and parameters have to be deduced. Is that right? If so, my previous posts were referring to the smoker-type model.
Your toy meta-model is consistent with what I was thinking when I used the word "model" in my previous comments.
3) I see what you're saying. If you add complexity to the model, you want to make sure that its improvement in a...
A putative new idea for AI control; index here.
Noise versus preference and complexity
Error versus bias versus preference
Preference versus prejudice (and bias)
Known prejudices
Revisiting complexity