Houshalter comments on [link] Baidu cheats in an AI contest in order to gain a 0.24% advantage - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (32)
That may be true in general, but LSVRC is much better about it. It's run like a Kaggle competition. They have a secret test set which no one can look at to train their algorithms on. They limit the number of evaluations you can do on the test set, which is what happened here. I also believe that the public test set is different than the private one, which is only used at the end of the competition, and no one can see how well they are doing on that.
Doing compression is not the goal of computer vision. Compression is only the goal of (some forms of) unsupervised learning, which has fallen out of favor in the last few years. Karpathy discusses some of the issues with it here:
It really is isomorphic to the generally proclaimed definition of computer vision as the inverse problem of computer graphics. Graphics starts with an abstract scene description and applies a transformation to obtain an image; vision attempts to back-infer the scene description from the raw image pixels. This process can be interpreted as a form of image compression, because the scene description is a far more parsimonious description of the image than the raw pixels. Read section 3.4.1 of my book for more details (the equivalent interpretation of vision-as-Bayesian-inference may also be of interest to some).
This is all generally true, but it also suffers from a key performance problem in that the various bits/variables in the high level scene description are not all equally useful.
For example, consider an agent that competes in something like a quake-world, where it just receives a raw visual pixel feed. A very detailed graphics pipeline relies on noise - literally as in perlin style noise functions - to create huge amounts of micro-details in local texturing, displacements, etc.
If you use a pure compression criteria, the encoder/vision system has to learn to essentially invert the noise functions - which as we know is computationally intractable. This ends up wasting a lot of computational effort attempting small gains in better noise modelling, even when those details are irrelevant for high level goals. You could actually just turn off the texture details completely and still get all of the key information you need to play the game.