No, let me try nailing this jelly to the wall once again. The definition-only-up-to-a-constant is a weakness of MDL, but this weakness isn't relevant to my question at all! Even if we had some globally unique variant of MDL derived from some nice mathematical idea, learning theory still doesn't use description lengths, and would be perfectly happy with rules that have long descriptions as long as we delineate a small set of those rules. To my mind this casts doubt on the importance of MDL.
learning theory still doesn't use description lengths, and would be perfectly happy with rules that have long descriptions as long as we delineate a small set of those rules
Any delineation of a small set of rules leads immediately to a short description length for the rules. You just need to encode the index of the rule in the set, costing log(N) bits for a set of size N.
Note that MDL is not the same as algorithmic information theory (definition-up-to-a-constant comes up in AIT, not MDL), though they're of course related.
I declare this Open Thread open for discussion of Less Wrong topics that have not appeared in recent posts.