Disclaimer: I am a physicist, and in the field of computer science my scholarship is weak. It may be that what I suggest here is well known, or perhaps just wrong.
Abstract: A Turing machine capable of saying whether two arbitrary Turing machines have the same output for all inputs is equivalent to solving the Halting Problem. To optimise a function it is necessary to prove that the optimised version always has the same output as the unoptimised version, which is impossible in general for Turing machines. However, real computers have finite input spaces.
Context: FOOM, Friendliness, optimisation processes.
Consider a computer program which modifies itself in an attempt to optimise for speed. A modification to some algorithm is *proper* if it results, for all inputs, in the same output; it is an optimisation if it results in a shorter running time on average for typical inputs, and a *strict* optimisation if it results in a shorter running time for all inputs.
A Friendly AI, optimising itself, must ensure that it remains Friendly after the modification; it follows that it can only make proper modifications. (When calculating a CEV it may make improper modifications, since the final answer for "How do we deal with X" may change in the course of extrapolating; but for plain optimisations the answer cannot change.)
For simplicity we may consider that the output of a function can be expressed as a single bit; the extension to many bits is obvious. However, in addition to '0' and '1' we must consider that the response to some input can be "does not terminate". The task is to prove that two functions, which we may consider as Turing machines, have the same output for all inputs.
Now, suppose you have a Turing machine that takes as input two arbitrary Turing machines and their respective tapes, and outputs "1" if the two input machines have the same output, and "0" otherwise. Then, by having one of the inputs be a Turing machine which is known not to terminate - one that executes an infinite loop - you can solve the Halting Problem. Therefore, such a machine cannot exist: You cannot build a Turing machine to prove, for arbitrary input machines, that they have the same output.
It seems to follow that you cannot build a fully general proper-optimisation detector.
However, "arbitrary Turing machines" is a strong claim, in fact stronger than we require. No physically realisable computer is a true Turing machine, because it cannot have infinite storage space, as the definition requires. The problem is actually the slightly easier (that is, not *provably* impossible) one of making a proper-optimisation detector for the space of possible inputs to an actual computer, which is finite though very large. In practice we may limit the input space still further by considering, say, optimisations to functions whose input is two 64-bit numbers, or something. Even so, the brute-force solution of running the functions on all possible inputs and comparing is already rather impractical.
I respectfully suggest that you are not thinking Weirdly enough. Notice that the evolved circuit still has structure that the laws of physics understand! An AI needn't operate directly at that level to make intuitive leaps beyond the capability of any human; and needn't operate by trial and error, precisely, (although, notice, we don't know the internal structure of generating an insight in human brains; for all we know it involves subconsciously trying a hundred different paths and discarding most of them) to generate stuff that's very different from the code it starts with.
Thinking Weirdly has nothing to do with it. I expect an AI to not use programming techniques it doesn't expect to be able to use effectively, and I expect the AI's expectations to use techniques effectively to be accurate. So, given that an AI is using a technique, even if it is Weird, I expect the AI to be effective using it. If you have an argument that a certain technique, like random guessing and checking, has insurmountable problems, then you have an argument that an AI will not use th... (read more)