You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

ESRogs comments on [link] Baidu cheats in an AI contest in order to gain a 0.24% advantage - Less Wrong Discussion

10 Post author: Wei_Dai 06 June 2015 06:39AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (32)

You are viewing a single comment's thread. Show more comments above.

Comment author: ESRogs 09 June 2015 06:12:39AM 1 point [-]

Is it important that it be lossless compression?

I can look at a picture of a face and know that it's a face. If you switched a bunch of pixels around, or blurred parts of the image a little bit, I'd still know it was a face. To me it seems relevant that it's a picture of a face, but not as relevant what all the pixels are. Does AI need to be able to do lossless compression to have understanding?

I suppose the response might be that if you have a bunch of pictures of faces, and know that they're faces, then you ought to be able to get some mileage out of that. And even if you're trying to remember all the pixels, there's less information to store if you're just diff-ing from what your face-understanding algorithm predicts is most likely. Is that it?

Comment author: Daniel_Burfoot 09 June 2015 02:51:36PM 1 point [-]

Well, lossless compression implies understanding. Lossy compression may or may not imply understanding.

Also, usually you can get a lossy compression algorithm from a lossless one. In image compression, the lossless method would typically be to send a scene description plus a low-entropy correction image; you can easily save bits by just skipping the correction image.

I emphasize lossless compression because it enables strong comparisons between competing methods.

Comment author: V_V 13 June 2015 08:57:16PM *  3 points [-]

Well, lossless compression implies understanding.

Not really, at least not until you start to approach Kolmogorov complexity.

In a natural image, most of the information is low level detail that has little or no human-relevant meaning: stuff like textures, background, lighting properties, minuscule shape details, lens artifacts, lossy compression artifacts (if the image was crawled from the Internet it was probably a JPEG originally), and so on.
Lots of this detail is highly redundant and/or can be well modeled by priors, therefore a lossless compression algorithm could be very good at finding an efficient encoding of it.

A typical image used in machine learning contests is 256 x 256 x 3 x 8 =~ 1.57 million bits. How many bits of meaningful information (*) could it possibly contain? 10? 100? 1000?
Whatever the number is, the amount of non-meaningful information certainly dominates, therefore an efficient lossless compression algorithm could obtain an extremely good compression ratio without compressing and thus understanding any amount of meaningful information.

(* consider meaningful information of an image as the number of yes-or-no questions about the image that a human could be normally interested in and would be able to answer by looking at the image, where for each question the probability of the answer being true is approximately 50% over the data set, and the set of question is designed in a way that allows a human to know as much as possible by asking the least number of questions, e.g. something like a 20 questions game.)

Comment author: Daniel_Burfoot 14 June 2015 02:47:41PM *  1 point [-]

I agree with your general point that working on lossless compression requires the researcher to pay attention to details that most people would consider meaningless or irrelevant. In my own text compression work, I have to pay a lot of attention to things like capitalization, comma placement, the difference between Unicode quote characters, etc etc. However, I have three responses to this as a critique of the research program:

The first response is to say that nothing is truly irrelevant. Or, equivalently, the vision system should not attempt to make the relevance distinction. Details that are irrelevant in everyday tasks might suddenly become very relevant in a crime scene investigation (where did this shadow at the edge of the image come from...?). Also, even if a detail is irrelevant at the top level, it might be relevant in the interpretation process; certainly shadowing is very important in the human visual system.

The second response is that while it is difficult and time-consuming to worry about details, this is a small price to pay for the overall goal of objectivity and methodological rigor. Human science has always required a large amount of tedious lab work and unglamorous experimental work.

The third response is to say that even if some phenomenon is considered irrelevant by "end users", scientists are interested in understanding reality for its own sake, not for the sake of applications. So pure vision scientists should be very interested in, say, categorizing textures, modeling shadows and lighting, and lens artifacts (Actually, in my interactions with computer graphics people, I have found this exact tendency).

Comment author: jacob_cannell 13 June 2015 09:43:56PM 0 points [-]

By your definition of meaningful information, it's not actually clear that a strong lossless compressor wouldn't discover and encode that meaningful information.

For example the presence of a face in an image is presumably meaningful information. From a compression point of view, the presence of a face and it's approximate pose is also information that has a very large impact on lower level feature coding, in that spending say 100 bits to represent the face and it's pose could save 10x as many bits in the lowest levels. Some purely unsupervised learning systems - such as sparse coding for example or RBMs - do tend to find high level features that correspond to objects (meaningful information).

Of course that does not imply that training using UL compression criteria is the best way to recognize any particular features/objects.

Comment author: V_V 14 June 2015 01:19:30AM *  0 points [-]

By your definition of meaningful information, it's not actually clear that a strong lossless compressor wouldn't discover and encode that meaningful information.

It could, but also it could not. My point is that compression ratio (that is, average log-likelihood of the data under the model) is not a good proxy for "understanding" since it can be optimized to a very large extent without modeling "meaningful" information.

Comment author: Daniel_Burfoot 14 June 2015 03:09:09PM 0 points [-]

Yes, good compression can be achieved without deep understanding. But a compressor with deep understanding will ultimately achieve better compression. For example, you can get good text compression results with a simple bigram or trigram model, but eventually a sophisticated grammar-based model will outperform the Ngram approach.

Comment author: Lumifer 09 June 2015 03:16:17PM 1 point [-]

lossless compression implies understanding

Huh? Understanding by whom? What exactly does the zip compressor understand?