Say, that we have the following observational data:
Planet | Aphelion 000 km |
Perihelion 000 km |
Orbit time days |
Mercury | 69,816 | 46,001 | 88 |
Venus | 108,942 | 107,476 | 225 |
Earth | 152,098 | 147,098 | 365 |
Mars | 249,209 | 206,669 | 687 |
Jupiter | 816,520 | 740,573 | 4,332 |
Saturn | 1,513,325 | 1,353,572 | 10,760 |
Uranus | 3,004,419 | 2,748,938 | 30,799 |
Neptune | 4,553,946 | 4,452,940 | 60,190 |
Pluto | 7,311,000 | 4,437,000 | 90,613 |
The minimal, the maximal distance between a planet and the Sun (both in thousands of kilometres) and the number of (Earth) days for one revolution around the Sun. Above is only the empirical data and no binding algorithm among the three quantities. The celestial mechanics rules which go by the name of the Kepler's laws. Can those rules be (re)invented by a computer program and how?
The following program code will be put into a simulator:
//declarations of the integer type variables
$DECLAREINT bad perihelion aphelion orbit guess dif temp zero temp1
//table with the known data in a simulator friendly format
$INVAR perihelion(46001) aphelion(69816) orbit(88)
$INVAR perihelion(107476) aphelion(108942) orbit(225)
$INVAR perihelion(147098) aphelion(152098) orbit(365)
$INVAR perihelion(206669) aphelion(249209) orbit(687)
$INVAR perihelion(740573) aphelion(816520) orbit(4332)
$INVAR perihelion(1353572) aphelion(1513325) orbit(10760)
$INVAR perihelion(2748938) aphelion(3004419) orbit(30799)
$INVAR perihelion(4452940) aphelion(4553946) orbit(60190)
$INVAR perihelion(4437000) aphelion(7311000) orbit(90613)
// variables orbit and bad can't be touched by the simulator
//to avoid a degeneration to a triviality
$RESVAR orbit bad
//do NOT use if clause, while clause do not set direct numbers ...
$RESCOM if while val_operation inc_dec
//bad is the variable, by which the whole program will be judged
//a big value of bad is bad. By this criteria programs will be wiped out
//from their virtual existence. A kind of anti-fitness
$PENVAL bad
//do show the following variables when simulating
$SHOWVAR bad,orbit,guess,dif
//penalize any command with 0 (nothing) and every line by 1 point
$WEIGHTS commands=0 lines=1
//minimize the whole program to 20 lines or less
$MINIMIZE lines 20
$BES
//the arena, where algorithms will be
//created and the fittest only will survive
$EES
//testing area where the simulator has no write access to
//here the bad (the penalized variable) is calculated
//bigger the difference between the known orbit and the variable guess
//worse is the evolved algorithm
dif=orbit-guess;
dif=abs(dif);
bad=dif;
temp=dif;
temp*=10000;
temp1=temp/orbit;
temp=temp1*temp1;
bad=bad+temp;
//end of the testing area
After several hours the following C code has been evolved inside of the $BES - $EES segment.
aphelion=perihelion+aphelion;
aphelion=aphelion+aphelion;
aphelion=aphelion+aphelion;
guess=12;
aphelion=aphelion>>guess;
temp=aphelion/guess;
aphelion=aphelion-temp;
dif=sqrt(aphelion);
aphelion=guess|aphelion;
aphelion=aphelion*dif;
aphelion=guess^aphelion;
guess=aphelion/guess;
What the simulator does? It bombards the arena segment with a random C commands. Usually it then just notices a syntax error and repairs everything to the last working version. If everything is syntactically good, the simulator interprets the program and checks if the mutated version causes any run-time error like division by zero, a memory leak and so on. In the case of such an error it returns to the last good version. Otherwise it checks the variable called "bad", if it is at least as small as it was ever before. In the case it is, a new version has just been created and it is stored.
The evolutionary pressure is working toward ever better code, which increasingly well guesses the orbit time of nine planets. In this case the "orbit" variable has been under the $RESVAR clause and then the "gues" variable has been tested against the "orbit" variable. Had been no "$RESVAR orbit" statement, a simple "guess=orbit;" would evolve quickly. Had been no "$RESVAR bad" statement a simple "bad=-1000000;" could derail the process.
Many thousands of algorithms are born and die every second on a standard Windows PC inside this simulator. Million or billion generations later, the digital evolution is still running, even if an excellent solution has been already found.
And how good approximation for the Kepler (Newton) celestial mechanics of the Solar system we have here?
This good for the nine planets where the code evolved:
Planet | Error % |
Mercury | 0.00 |
Venus | 0.44 |
Earth | 0.27 |
Mars | 0.29 |
Jupiter | 0.16 |
Saturn | 0.65 |
Uranus | 0.10 |
Neptune | 0.79 |
Pluto | 1.08 |
And this good for the control group of a comet and six asteroids:
Asteroid/Comet | Error % |
Halley | 1.05 |
Hebe | 1.37 |
Astraea | 1.99 |
Juno | 3.19 |
Pallas | 1.66 |
Vesta | 2.49 |
Ceres | 2.02 |
Could be even much better after another billion generations and maybe with even more $INVAR examples. Generally, you can pick any three columns from any integer type table you want. And see this way, how they are related algorithmically. Can be more than three columns also.
The name of the simulator (evoluator) is Critticall and it is available at http://www.critticall.com
A well-written program does not have it, but many real-life programs do. In my programming environments I have usually warnings turned very sensitive, for example they complain about declared but never used variables/functions, and when I read other people's code, I see this often.
Could we see a program with syntax error as a fatal mutation? Fortunately, these fatal mutations are easy to detect. (Alternatively, we could mutate the parse trees.)
Still I have seen humans do this and take their paychecks. Yes, I would expect such evolved program be full of bugs; but real-life applications also contain bugs and it does not prevent them from being useful. I admit I would hate to read the machine-generated code, or even worse fix the bugs or add new functionality. But this is like saying that reading DNA is, ahem, not very user-friendly. Sure, it is not; but it works.
Two more things to consider. First, unit tests -- the better unit tests you make, the higher chance is that a random program that passes them is correct. If test-driven development can reduce random human errors, it can also reduce computer-generated error, though of course the computer will make much more errors. Second, we are OK with randomized algorithms having some chance of error, as long at the chance is very very small.
So the prior probability of computer generating a correct algorithm is small, but non-zero. By making unit tests we increase the probability that the result will be correct. If we can increase this probability from epsilon to 1 - epsilon, we win.