You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

jacob_cannell comments on [link] Baidu cheats in an AI contest in order to gain a 0.24% advantage - Less Wrong Discussion

10 Post author: Wei_Dai 06 June 2015 06:39AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (32)

You are viewing a single comment's thread. Show more comments above.

Comment author: jacob_cannell 13 June 2015 09:43:56PM 0 points [-]

By your definition of meaningful information, it's not actually clear that a strong lossless compressor wouldn't discover and encode that meaningful information.

For example the presence of a face in an image is presumably meaningful information. From a compression point of view, the presence of a face and it's approximate pose is also information that has a very large impact on lower level feature coding, in that spending say 100 bits to represent the face and it's pose could save 10x as many bits in the lowest levels. Some purely unsupervised learning systems - such as sparse coding for example or RBMs - do tend to find high level features that correspond to objects (meaningful information).

Of course that does not imply that training using UL compression criteria is the best way to recognize any particular features/objects.

Comment author: V_V 14 June 2015 01:19:30AM *  0 points [-]

By your definition of meaningful information, it's not actually clear that a strong lossless compressor wouldn't discover and encode that meaningful information.

It could, but also it could not. My point is that compression ratio (that is, average log-likelihood of the data under the model) is not a good proxy for "understanding" since it can be optimized to a very large extent without modeling "meaningful" information.

Comment author: Daniel_Burfoot 14 June 2015 03:09:09PM 0 points [-]

Yes, good compression can be achieved without deep understanding. But a compressor with deep understanding will ultimately achieve better compression. For example, you can get good text compression results with a simple bigram or trigram model, but eventually a sophisticated grammar-based model will outperform the Ngram approach.