Disclaimer: I am a physicist, and in the field of computer science my scholarship is weak. It may be that what I suggest here is well known, or perhaps just wrong.
Abstract: A Turing machine capable of saying whether two arbitrary Turing machines have the same output for all inputs is equivalent to solving the Halting Problem. To optimise a function it is necessary to prove that the optimised version always has the same output as the unoptimised version, which is impossible in general for Turing machines. However, real computers have finite input spaces.
Context: FOOM, Friendliness, optimisation processes.
Consider a computer program which modifies itself in an attempt to optimise for speed. A modification to some algorithm is *proper* if it results, for all inputs, in the same output; it is an optimisation if it results in a shorter running time on average for typical inputs, and a *strict* optimisation if it results in a shorter running time for all inputs.
A Friendly AI, optimising itself, must ensure that it remains Friendly after the modification; it follows that it can only make proper modifications. (When calculating a CEV it may make improper modifications, since the final answer for "How do we deal with X" may change in the course of extrapolating; but for plain optimisations the answer cannot change.)
For simplicity we may consider that the output of a function can be expressed as a single bit; the extension to many bits is obvious. However, in addition to '0' and '1' we must consider that the response to some input can be "does not terminate". The task is to prove that two functions, which we may consider as Turing machines, have the same output for all inputs.
Now, suppose you have a Turing machine that takes as input two arbitrary Turing machines and their respective tapes, and outputs "1" if the two input machines have the same output, and "0" otherwise. Then, by having one of the inputs be a Turing machine which is known not to terminate - one that executes an infinite loop - you can solve the Halting Problem. Therefore, such a machine cannot exist: You cannot build a Turing machine to prove, for arbitrary input machines, that they have the same output.
It seems to follow that you cannot build a fully general proper-optimisation detector.
However, "arbitrary Turing machines" is a strong claim, in fact stronger than we require. No physically realisable computer is a true Turing machine, because it cannot have infinite storage space, as the definition requires. The problem is actually the slightly easier (that is, not *provably* impossible) one of making a proper-optimisation detector for the space of possible inputs to an actual computer, which is finite though very large. In practice we may limit the input space still further by considering, say, optimisations to functions whose input is two 64-bit numbers, or something. Even so, the brute-force solution of running the functions on all possible inputs and comparing is already rather impractical.
I also have some experience in programming, but I think you are focusing too narrowly on human techniques. Humans have discovered a certain toolkit for optimisations, and they proceed as you say. But note that we usually optimise for readability and maintainability of the code, not for speed. Refactorings such as putting some code in a separate method will actually slow you down slightly, since you get the overhead of an extra stack frame - unless of course the compiler inlines the method for you!
A useful AI, however, is not constrained by human limitations; if it were, we wouldn't build it. It can construct completely bizarre code; it doesn't necessarily make small changes to existing code. Think of the evolved circuit that discriminates between high and low frequencies: No human engineer would ever build such a thing, and certainly you would never arrive at it by deliberately making small working modifications to an existing, working circuit. But an AI might arrive at such a thing, or its equivalent in code, by trial and error, and then need to prove that the output is still the same.
An AI is not constrained to write human maintainable code because the AI can maintain more complicated code than humans. But the AI's code will still have structure that the AI understands and uses to manage the complexity.
I do not expect an AI to write code by trial and error, so I am not worried about difficulties with that approach.