AlephNeil comments on Kevin T. Kelly's Ockham Efficiency Theorem - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (81)
AFAICT, the 'proposed solution' from WiaLLL doesn't work. The rest of this post will be devoted to proposing a specific attack on it. As such it will be longish - since WiaLLL makes a strong claim about a fundamental aspect of epistemology, it deserves a considered refutation.
My attack works on the assumption that current sets count programs, not languages - i.e. two different programs that implement the same language will result in two entries in the next current set (regardless of whether they are written in the same language). Paul Almond has told me that this is what he intended in WiaLLL.
Consider a family of turing-complete languages, whose members are called Xn (n ∈ N). I will use 'X-like' to refer to any language from this family. Let X := X8.
Programs for any language Xn can be divided into three categories:
Programs that begin with n 0 bits. This prefix is solely used to encode that these programs are of type 1; it encodes no other data. The rest of the program source encodes a program in some suitable turing-complete language (the details of which don't matter for this attack).
Programs that have a length of 0 (i.e. empty strings). All such programs are defined to implement X.
Programs that don't fit either 1 or 2. All such programs are defined to implement X(n+1).
It follows that the set of programs of any bounded length (>0) writable in any Xn (n>0) is dominated by interpreters for X-like languages with a greater n than their own. While all such languages are turing-complete, for any given program length the proportion of non-X-like-interpreting programs writable in them is small, and inversely exponential in n.
What happens when we run the proposed solution from WiaLLL on this X? Assuming sufficiently high values of N, on each step but the first the current set will consist of a mix of X-likes and other languages. For purposes of calculating a lower bound on the long-term proportion of X-likes, assume that programs for non-X-likes never implement an X-like. The X-likes will then tend to thin out, since some of their programs fall into category 1 and implement non-X-likes; however this process is bounded to produce a fraction of non-X-likes no larger than \sum_{i=8}^\infty (1/2^i) = 0.0078125.
The X-likes will also thin themselves out through programs in category 2, since X itself will produce a fixed fraction of non-X-likes. How much they do so depends on the exact growth of N and I in the limit. Given a function I(N) which maps values for program length to the number of iterations to run with this length, one can calculate a rough upper bound on the proportion of non-X-likes produced in this manner as 1-\prod_{N=N0}^\infty (1-I(N)/2^N).
The detailed results of this process therefore depend on the choice of I(N), and of N0. The general relationship is that unless I(N) grows quite quickly in N, the X-likes will only be notably thinned out for small N, and therefore will not be thinned out significantly given a sufficiently big N0. For example, for I(N)=16 and N0=16 (16 iterations per maximum program length, minimum maximum program length of 16 bits) the result is about 0.00049.
Therefore, with free choice of the starting language and some reasonable assumptions about the limiting process used for I and N, we can ensure that the language mix in the final current set will be dominated by X-likes.
Why does all of this matter? Because any X-like especially likes to implement X (since category 2 programs have the minimum length of 0 bits), and for sufficiently large n really doesn't like to implement any non-X-like, and as such any current set containing a sufficient proportion of X-likes will rate X as lower-level than any other language. This argument might in theory fail because of the possibility that other languages would on average penalize X-likes much more than X-likes penalize them, but if so this can be worked around by setting X to a higher Xn than 8 - the family can be redefined to set the initial n to any integer we need it to be to work around such finite factors.
Therefore, unless I'm missing something critical, the output of the procedure proposed in WiaLLL would be highly sensitive to the starting language as well as I(N) and N0. It therefore fails in its attempt to define a simplicity measure on languages in the absence of additional assumptions not stated in the article.
So essentially, the problem with Paul Almond's suggestion is that one can find languages X that are 'X[0]-pathological' in the sense that almost all of their children are both (i) good at implementing some fixed language X[0] and (ii) even more X[0]-pathological than they are (this being a recursive definition of 'pathological').
Such languages destroy any hope that Paul's "low-level index for Y" won't depend on which language X we start from.