Douglas_Knight comments on Automatic programming, an example - Less Wrong

12 Post author: Thomas 01 February 2012 08:55PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (32)

You are viewing a single comment's thread. Show more comments above.

Comment author: Douglas_Knight 02 February 2012 01:47:19AM *  1 point [-]

The XOR with 12 won't do much after dividing by 12. For small radii, OR with 12 (in units of about 10^6km) will have an effect. These two constants are probably just overfitting. Indeed, it nails Mercury, the smallest and thus most vulnerable to these effects.* Rounding** the square root is also probably overfitting or just noise. It will have a larger effect, but smooth across planets, so it is probably canceled out by the choice of other numbers. Ignoring those three effects, it is a constant times the 3/2 power of average of the axes. The deviation from Kepler's law is that it should ignore perihelion.*** But for un-eccentric orbits, there's no difference. Since the training data isn't eccentric, this failure is unsurprising. That is, the code is unsurprising; that the code is so accurate is surprising. That it correctly calculates the orbital period of Halley's comet, rather than underestimating by a factor of 2^(3/2) is implausible.***

* The control group is too homogeneous. If it contained something close in, overfitting for Mercury would have been penalized in the final evaluation.

** Are you sure it's rounding? [Edit: Yes: bitwise operations are strongly suggestive.]

*** These statements are wrong because I confused apehelion with the semi-major axis. So removing the bitwise operations yields exactly Kepler's law. If you switch from ints to doubles it becomes more accurate. But wmorgan has a constant error: it is divide by 4096, not 1024. This should make the rounding errors pretty bad for Mercury. Maybe the bitwise operations are to fix this, if they aren't noise. My C compiler does not reproduce the claimed error percentages, so I'm not going to pursue this.