I wonder if this can't be considered more pragmatically? There was a passage in the MIT Encyclopedia of Cognitive Sciences in the Logic entry that seems relevant:
Johnson-Laird and Byrne (1991) have argued that postulating more imagelike MENTAL MODELS make better predictions about the way people actually reason. Their proposal, applied to our sample argument, might well help to explain the difference in difficulty in the various inferences mentioned earlier, because it is easier to visualize “some people” and “at least three people” than it is to visualize “most people.” Cognitive scientists have recently been exploring computational models of reasoning with diagrams. Logicians, with the notable exceptions of Euler, Venn, and Peirce, have until the past decade paid scant attention to spatial forms of representation, but this is beginning to change (Hammer 1995).
This made me think a bit differently about how we might choose between two abstract models with the same explanatory power. It seems that the rational thing to do is to choose the one that allows you to reason the most fluently so as to minimize the likelihood of fallacious reasoning.
In fact, it seems that we should expect the cognitive sciences to provide clues about how we could adjust formal systems with the view of easy of understanding and technical fluency when reasoning about/with them.
Taking this view; assuming we had finished physics, all the future work would be about tweaking the formalisms toward the most intuitive possible ones with respect to the knowledge we have of human reasoning. What would be important is that they be as easy to understand as possible. That way we could hope to ensure more efficiency in technological development as well as better general understanding among the public.
I was thinking on a similar line:
Given that computation has costs, memory is limited, to make the best possible predictions given some resources one needs to use the computationally least expensive way.
Assuming that generating a mathematical model is (at least on average) more difficult for more complex theories, wasting time by creating (at the end equivalent) models by having to incorporate epiphenomenal concepts leads to practically worse predictions.
So not using the strong Occam's razor would lead to worse results.
And because we have taking moral issue...
This post is a summary of the different positions expressed in the comments to my previous post and elsewhere on LW. The central issue turned out to be assigning "probabilities" to individual theories within an equivalence class of theories that yield identical predictions. Presumably we must prefer shorter theories to their longer versions even when they are equivalent. For example, is "physics as we know it" more probable than "Odin created physics as we know it"? Is the Hamiltonian formulation of classical mechanics apriori more probable than the Lagrangian formulation? Is the definition of reals via Dedekind cuts "truer" than the definition via binary expansions? And are these all really the same question in disguise?
One attractive answer, given by shokwave, says that our intuitive concept of "complexity penalty" for theories is really an incomplete formalization of "conjunction penalty". Theories that require additional premises are less likely to be true, according to the eternal laws of probability. Adding premises like "Odin created everything" makes a theory less probable and also happens to make it longer; this is the entire reason why we intuitively agree with Occam's Razor in penalizing longer theories. Unfortunately, this answer seems to be based on a concept of "truth" granted from above - but what do differing degrees of truth actually mean, when two theories make exactly the same predictions?
Another intriguing answer came from JGWeissman. Apparently, as we learn new physics, we tend to discard inconvenient versions of old formalisms. So electromagnetic potentials turn out to be "more true" than electromagnetic fields because they carry over to quantum mechanics much better. I like this answer because it seems to be very well-informed! But what shall we do after we discover all of physics, and still have multiple equivalent formalisms - do we have any reason to believe simplicity will still work as a deciding factor? And the question remains, which definition of real numbers is "correct" after all?
Eliezer, bless him, decided to take a more naive view. He merely pointed out that our intuitive concept of "truth" does seem to distinguish between "physics" and "God created physics", so if our current formalization of "truth" fails to tell them apart, the flaw lies with the formalism rather than with us. I have a lot of sympathy for this answer as well, but it looks rather like a mystery to be solved. I never expected to become entangled in a controversy over the notion of truth on LW, of all places!
A final and most intriguing answer of all came from saturn, who alluded to a position held by Eliezer and sharpened by Nesov. After thinking it over for awhile, I generated a good contender for the most confused argument ever expressed on LW. Namely, I'm going to completely ignore the is-ought distinction and use morality to prove the "strong" version of Occam's Razor - that shorter theories are more "likely" than equivalent longer versions. You ready? Here goes:
Imagine you have the option to put a human being in a sealed box where they will be tortured for 50 years and then incinerated. No observational evidence will ever leave the box. (For added certainty, fling the box away at near lightspeed and let the expansion of the universe ensure that you can never reach it.) Now consider the following physical theory: as soon as you seal the box, our laws of physics will make a localized exception and the victim will spontaneously vanish from the box. This theory makes exactly the same observational predictions as your current best theory of physics, so it lies in the same equivalence class and you should give it the same credence. If you're still reluctant to push the button, it looks like you already are a believer in the "strong Occam's Razor" saying simpler theories without local exceptions are "more true". QED.
It's not clear what, if anything, the above argument proves. It probably has no consequences in reality, because no matter how seductive it sounds, skipping over the is-ought distinction is not permitted. But it makes for a nice koan to meditate on weird matters like "probability as preference" (due to Nesov and Wei Dai) and other mysteries we haven't solved yet.
ETA: Hal Finney pointed out that the UDT approach - assuming that you live in many branches of the "Solomonoff multiverse" at once, weighted by simplicity, and reducing everything to decision problems in the obvious way - dissolves our mystery nicely and logically, at the cost of abandoning approximate concepts like "truth" and "degree of belief". It agrees with our intuition in advising you to avoid torturing people in closed boxes, and more generally in all questions about moral consequences of the "implied invisible". And it nicely skips over all the tangled issues of "actual" vs "potential" predictions, etc. I'm a little embarrassed at not having noticed the connection earlier. Now can we find any other good solutions, or is Wei's idea the only game in town?