I think that's roughly correct, but it is useful...
'The best UTM is the one that figures out the right answer the fastest' is true, but not very useful.
Another way to frame it would be: after one has figured out the laws of physics, a good-for-these-laws-of-physics Turning machine is useful for various other things, including thermodynamics. 'The best UTM is the one that figures out the right answer the fastest' isn't very useful for figuring out physics in the first place, but most of the value of understanding physics comes after it's figured out (as we can see from regular practice today).
Also, we can make partial updates along the way. If e.g. we learn that physics is probably local but haven't understood all of it yet, then we know that we probably want a local machine for our theory. If we e.g. learn that physics is causally acyclic, then we probably don't want a machine with access to atomic unbounded fixed-point solvers. Etc.
I think you might have misread something? The graphical statement of theorem 2 does not say that if is determined by , then is a mediator; that would indeed be false in general. It says that:
In particular, the theorem says that under some conditions is determined by . Determination is in the conclusion, not the premises. On the flip side, being a mediator is in the premises, not the conclusion.
What I have in mind re:boundedness...
If we need to use a Turing machine which is roughly equivalent to physics, then a natural next step is to drop the assumption that the machine in question is Turing complete. Just pick some class of machines which can efficiently simulate our physics, and which can be efficiently implemented in our physics. And then, one might hope, the sort of algorithmic thermodynamic theory the paper presents can carry over to that class of machines.
Probably there are some additional requirements for the machines, like some kind of composability, but I don't know exactly what they are.
This would also likely result in a direct mapping between limits on the machines (like e.g. limited time or memory) and corresponding limits on the physical systems to which the theory applies for those machines.
The resulting theory would probably read more like classical thermo, where we're doing thought experiments involving fairly arbitrary machines subject to just a few constraints, and surprisingly general theorems pop out.
Then you would have been wrong. No Free Lunch Theorems do not bind to reality.
Haven't been using that one, but I expect it would have very different results than the dataset we are using. That one would test very different things than we're currently trying to get feedback on; there's a lot more near-deterministic known structure in that one IIRC.
Good question, it's the right sort of question to ask here, and I don't know the answer. That does get straight into some interesting follow-up questions about e.g. the ability to physically isolate the machine from noise, which might be conceptually load-bearing for things like working with arbitrary precision quantities.
One of the classic conceptual problems with a Solomonoff-style approach to probability, information, and stat mech is "Which Turing machine?". The choice of Turing machine is analogous to the choice of prior in Bayesian probability. While universality means that any two Turing machines give roughly the same answers in the limit of large data (unlike two priors in Bayesian probability, where there is no universality assumption/guarantee), they can be arbitrarily different before then.
My usual answer to this problem is "well, ultimately this is all supposed to tell us things about real computational systems, so pick something which isn't too unreasonable or complex for a real system".
But lately I've been looking at Aram Ebtekar and Marcus Hutter's Foundations of Algorithmic Thermodynamics. Based on both the paper and some discussion with Aram (along with Steve Petersen), I think there's maybe a more satisfying answer to the choice-of-Turing-machine issue in there.
Two key pieces:
The first piece is a part of the theory which can only bind to reality insofar as our chosen Turing machine is tractable to physically implement. The second piece is a part of the theory which can only bind to reality insofar as our physics can be tractably implemented on our chosen Turing machine.
In other words: in order for this thermodynamic theory to work well, we need to choose a Turing machine which is "computationally equivalent to" physics, in the sense that our physics can run the machine without insane implementation size, and the machine can run our physics without insane implementation size.
I'm still wrapping my head around all the pieces here, so hopefully I (or, better yet, someone else) will write up a more clear explainer in the future. But this smells really promising to me. Not just for purposes of Solomonoff thermodynamics, but also as a more principled way to tackle bounded rationality of embedded systems.
That would be pretty reasonable, but it would make the model comparison part even harder. I do need P[X] (and therefore Z) for model comparison; this is the challenge which always comes up for Bayesian model comparison.
It sounds like you are not not claiming that superintelligence will have human-like scope insensitivity baked into its preferences? Which seems like an absolutely bonkers thing to claim. "1 billionth of resources" does not at all seem like a natural way for "slight caring" to manifest in an actually-advanced mind; it seems like a thing which very arguably occurs in human minds but is particularly unlikely to generalize to superintelligence precisely because the generalized version would kneecap many general capabilities quite badly.
Notably that post has a section arguing against roughly the sort of thing I'm arguing for:
My response would be: yes, what-constitutes-a-low-level-language is obviously contingent on our physics and even on our engineering, not just on the language. I wouldn't even expect aliens in our own universe to have low-level programming languages very similar to our own. Our low level languages today are extremely dependent on specific engineering choices made in the mid 20th century which are now very locked in by practice, but do not seem particularly fundamental or overdetermined, and would not be at all natural in universes with different physics or cultures with different hardware architecture. Aliens would look at our low-level languages and recognize them as low-level for our hardware, but not at all low-level for their hardware.
Analogously: choice of a good computing machine depends on the physics of one's universe.
I do like the guy's style of argumentation a lot, though.