Daniel_Burfoot comments on Two Challenges - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (13)
Right - this would be what I'd call "cheating" or overfitting the data. We'd have to use the compression rate in this case.
Sure. I'll work out the technical details if anyone wants to enter the contest. I would prefer to use the most recent stable JVM. It seems very unlikely to me that the outcome of the contest will depend on precise selection of time or memory bounds - let's say, the time bound is O(24 hours) and the memory bound in O(2 GB).
It's actually not very difficult to implement a compression program using arithmetic coding once you have the statistical model. Other prediction evaluation schemes may work, but compression has methodological crispness: look at the compressed file size, check that the decompressed data matches the original exactly.
Basically, when I say "belief networks", what I mean is the use of graphs to define probability distributions and conditional independence relationships.
The spirit of the contest is to use a truly "natural" data set. I admit that this is a bit vague. Really my only requirement is to use a non-synthetic data set. I think I know where you're going with the "causally dependent" line of thinking, but it doesn't bother me too much. I get the feeling that I am walking into a trap, but really I've been planning to make a donation to SIAI anyway, so I don't mind losing.