Followup to: The prior of a hypothesis does not depend on its complexity
Eliezer wrote:
In physics, you can get absolutely clear-cut issues. Not in the sense that the issues are trivial to explain. [...] But when I say "macroscopic decoherence is simpler than collapse" it is actually strict simplicity; you could write the two hypotheses out as computer programs and count the lines of code.
Every once in a while I come across some belief in my mind that clearly originated from someone smart, like Eliezer, and stayed unexamined because after you hear and check 100 correct statements from someone, you're not about to check the 101st quite as thoroughly. The above quote is one of those beliefs. In this post I'll try to look at it closer and see what it really means.
Imagine you have a physical theory, expressed as a computer program that generates predictions. A natural way to define the Kolmogorov complexity of that theory is to find the length of the shortest computer program that generates your program, as a string of bits. Under this very natural definition, the many-worlds interpretation of quantum mechanics is almost certainly simpler than the Copenhagen interpretation.
But imagine you refactor your prediction-generating program and make it shorter; does this mean the physical theory has become simpler? Note that after some innocuous refactorings of a program expressing some physical theory in a recognizable form, you may end up with a program that expresses a different set of physical concepts. For example, if you take a program that calculates classical mechanics in the Lagrangian formalism, and apply multiple behavior-preserving changes, you may end up with a program whose internal structures look distinctly Hamiltonian.
Therein lies the rub. Do we really want a definition of "complexity of physical theories" that tells apart theories making the same predictions? If our formalism says Hamiltonian mechanics has a higher prior probability than Lagrangian mechanics, which is demonstrably mathematically equivalent to it, something's gone horribly wrong somewhere. And do we even want to define "complexity" for physical theories that don't make any predictions at all, like "glarble flargle" or "there's a cake just outside the universe"?
At this point, the required fix to our original definition should be obvious: cut out the middleman! Instead of finding the shortest algorithm that writes your algorithm for you, find the shortest algorithm that outputs the same predictions. This new definition has many desirable properties: it's invariant to refactorings, doesn't discriminate between equivalent formulations of classical mechanics, and refuses to specify a prior for something you can never ever test by observation. Clearly we're on the right track here, and the original definition was just an easy fixable mistake.
But this easy fixable mistake... was the entire reason for Eliezer "choosing Bayes over Science" and urging us to do same. The many-worlds interpretation makes the same testable predictions as the Copenhagen interpretation right now. Therefore by the amended definition of "complexity", by the right and proper definition, they are equally complex. The truth of the matter is not that they express different hypotheses with equal prior probability - it's that they express the same hypothesis. I'll be the first to agree that there are very good reasons to prefer the MWI formulation, like its pedagogical simplicity and beauty, but K-complexity is not one of them. And there may even be good reasons to pledge your allegiance to Bayes over the scientific method, but this is not one of them either.
ETA: now I see that, while the post is kinda technically correct, it's horribly confused on some levels. See the comments by Daniel_Burfoot and JGWeissman. I'll write an explanation in the discussion area.
ETA 2: done, look here.
Two arguments - or maybe two formulations of the one argument - for complexity reducing probability, and I think the juxtaposition explains why it doesn't feel like complexity should be a straight-up penalty for a theory.
The human-level argument for complexity reducing probability something like A∩B is more probable than A∩B∩C because the second has three fault-lines, so to speak, and the first only has two, so the second is more likely to crack. edit: equally or more likely, not strictly more likely. (For engineers out there; I have found this metaphor to be invaluable both in spotting this in conversation, and explaining this in conversation to people). As byrnema noted down below, that doesn't seem applicable here, at least not in the direct simpler = better way, especially when having the same predictions seems to indicate that A, B, and C are all right.
The formal argument for complexity penalty (and this is philosophy, so bear with me) is that a priori, having absolutely no experiences about the universe so that all premises are equally likely (with nothing to privilege any of them, they default... the universal prior, if you like) - the theory with the fewest conjunctions of premises is the most likely by virtue of probability theory. Now, we are restricted in our observations, because they don't tell us what actually is; they merely tell us that anything that predicts the outcome is, and everything that doesn't predict the outcome, isn't. This includes adhoc theories and overcomplicated theories like "Odin made Horus made God made the universe as we know it." However, we can extend that previous argument: Given that our observations have narrowed the universe as we know it to this section of hypotheses, we have no experiences that say something about any of the hypotheses in that section. So, a priori, all possible premises within that section are equally likely. So we should choose the one with the least conjunctions of premises, according to probability theory.
This doesn't really get to the heart of the matter addressed in the post, but it does justify a form of complexity-as-penalty that has some bearing: namely, that if Hamiltonian requires less premises than Lagrangian, and predictions bear out both of these systems out equally well, Hamiltonian is more probable, because it is less likely to be wrong due to a false premise somewhere in the area we haven't yet accessed. (In formal logic, Lagrangian is probably using some premise it doesn't need to).
Uuuhhhh, wait, there's something wrong with your post. A simple logical statement can imply a complex-looking logical statement, right? Imagine that C is a very simple statement that implies B which is very complex. Then A∩B∩C is logically equivalent to A∩C, which is simpler than A∩B because C is simpler than B by assumption. Whoops.