jacob_cannell comments on [link] Baidu cheats in an AI contest in order to gain a 0.24% advantage - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (32)
By your definition of meaningful information, it's not actually clear that a strong lossless compressor wouldn't discover and encode that meaningful information.
For example the presence of a face in an image is presumably meaningful information. From a compression point of view, the presence of a face and it's approximate pose is also information that has a very large impact on lower level feature coding, in that spending say 100 bits to represent the face and it's pose could save 10x as many bits in the lowest levels. Some purely unsupervised learning systems - such as sparse coding for example or RBMs - do tend to find high level features that correspond to objects (meaningful information).
Of course that does not imply that training using UL compression criteria is the best way to recognize any particular features/objects.
It could, but also it could not. My point is that compression ratio (that is, average log-likelihood of the data under the model) is not a good proxy for "understanding" since it can be optimized to a very large extent without modeling "meaningful" information.
Yes, good compression can be achieved without deep understanding. But a compressor with deep understanding will ultimately achieve better compression. For example, you can get good text compression results with a simple bigram or trigram model, but eventually a sophisticated grammar-based model will outperform the Ngram approach.