Limits on self-optimisation

RolfAndreassen

Limits on self-optimisation — LessWrong

10 Limits on self-optimisation

20th Jan 2012

2 min read

10

Disclaimer: I am a physicist, and in the field of computer science my scholarship is weak. It may be that what I suggest here is well known, or perhaps just wrong.

Abstract: A Turing machine capable of saying whether two arbitrary Turing machines have the same output for all inputs is equivalent to solving the Halting Problem. To optimise a function it is necessary to prove that the optimised version always has the same output as the unoptimised version, which is impossible in general for Turing machines. However, real computers have finite input spaces.

Context: FOOM, Friendliness, optimisation processes.

Consider a computer program which modifies itself in an attempt to optimise for speed. A modification to some algorithm is *proper* if it results, for all inputs, in the same output; it is an optimisation if it results in a shorter running time on average for typical inputs, and a *strict* optimisation if it results in a shorter running time for all inputs.

A Friendly AI, optimising itself, must ensure that it remains Friendly after the modification; it follows that it can only make proper modifications. (When calculating a CEV it may make improper modifications, since the final answer for "How do we deal with X" may change in the course of extrapolating; but for plain optimisations the answer cannot change.)

For simplicity we may consider that the output of a function can be expressed as a single bit; the extension to many bits is obvious. However, in addition to '0' and '1' we must consider that the response to some input can be "does not terminate". The task is to prove that two functions, which we may consider as Turing machines, have the same output for all inputs.

Now, suppose you have a Turing machine that takes as input two arbitrary Turing machines and their respective tapes, and outputs "1" if the two input machines have the same output, and "0" otherwise. Then, by having one of the inputs be a Turing machine which is known not to terminate - one that executes an infinite loop - you can solve the Halting Problem. Therefore, such a machine cannot exist: You cannot build a Turing machine to prove, for arbitrary input machines, that they have the same output.

It seems to follow that you cannot build a fully general proper-optimisation detector.

However, "arbitrary Turing machines" is a strong claim, in fact stronger than we require. No physically realisable computer is a true Turing machine, because it cannot have infinite storage space, as the definition requires. The problem is actually the slightly easier (that is, not *provably* impossible) one of making a proper-optimisation detector for the space of possible inputs to an actual computer, which is finite though very large. In practice we may limit the input space still further by considering, say, optimisations to functions whose input is two 64-bit numbers, or something. Even so, the brute-force solution of running the functions on all possible inputs and comparing is already rather impractical.

Personal Blog

10

New Comment

37 comments, sorted by

top scoring

Click to highlight new comments since: Today at 4:42 AM

[-]JGWeissman14y90

Given a program, finding another program that has the same input to output map is much easier than given two programs, determining if they have the same input to output map.

The halting problem and similar are generally not practical problems when you are deliberately constructing your programs with analyzable structure, and you are willing to require that programs not just have a desired property, but that they provably have the desired property.

[-]RolfAndreassen14y20

Given a program, finding another program that has the same input to output map is much easier than given two programs, determining if they have the same input to output map.

This is interesting. Do you have any sources that discuss why this is so?

[-]asr14y40

Given a program, finding another program that has the same input to output map is much easier than given two programs, determining if they have the same input to output map.

This is interesting. Do you have any sources that discuss why this is so?

I have a short proof: suppose we have some program, A. And some program that provably does nothing, B.

Run first B, then A. Presto, a new program with output identical to A. Depending on whether A is a black box or some structure you can reason about, you can also hoist B up into the body of A. Again, provably output identical to A.

The underlying reason this is easier than proving general equivalence of programs is that we picked specific program B with a well-known behavior. Trying to prove two programs equivalent would be trying to reason about all possible modifications, rather than some one specific patch.

[-]saturn14y20

You can prove that a smaller set of transformations preserve the behavior of subsections of a program, then you can combine arbitrarily many of those transformations and preserve the input to output map of a larger program.

[-]JGWeissman14y20

This comes from my direct experience programming computers.

Making small changes to a program which do not affect its input output map is usually trivially easy, and IDEs such as Eclipse and Visual Studios include tools to make such changes for you, such as extracting a section of code into a method, and replacing that section of code with a call to that method. In general this is called refactoring, and its common forms tend to be useful preparations for more substantial changes.

With respect to optimization, a program that is well organized into methods which hide implementation details can be changed by replacing a method with one that performs the same function with a more efficient implementation. Sometimes a strict refactoring is not required, you just have to preserve the properties that callers of the method rely on. For example, if you have a method which sorts a list, and you only care that the sorted list reflects the ordering, you could replace it using an algorithm with different behavior with respect to elements that are equivalent by the ordering. More to the point, if you have a chess AI that searches the game tree as deep as it can in a certain time, and you optimize the search, you are happy that the AI produces different output as long as it still outputs the best move it finds according to its deeper search of the game tree. For similar reasons, an AGI should produce better output in real time if it substantially optimizes its own efficiency.

Asking if you can determine if arbitrary have the same output is really a wrong question. Ask if you can construct a better program that has the same properties you care about.

[-]RolfAndreassen14y20

I also have some experience in programming, but I think you are focusing too narrowly on human techniques. Humans have discovered a certain toolkit for optimisations, and they proceed as you say. But note that we usually optimise for readability and maintainability of the code, not for speed. Refactorings such as putting some code in a separate method will actually slow you down slightly, since you get the overhead of an extra stack frame - unless of course the compiler inlines the method for you!

A useful AI, however, is not constrained by human limitations; if it were, we wouldn't build it. It can construct completely bizarre code; it doesn't necessarily make small changes to existing code. Think of the evolved circuit that discriminates between high and low frequencies: No human engineer would ever build such a thing, and certainly you would never arrive at it by deliberately making small working modifications to an existing, working circuit. But an AI might arrive at such a thing, or its equivalent in code, by trial and error, and then need to prove that the output is still the same.

[-]JGWeissman14y20

An AI is not constrained to write human maintainable code because the AI can maintain more complicated code than humans. But the AI's code will still have structure that the AI understands and uses to manage the complexity.

I do not expect an AI to write code by trial and error, so I am not worried about difficulties with that approach.

[-]RolfAndreassen14y20

I respectfully suggest that you are not thinking Weirdly enough. Notice that the evolved circuit still has structure that the laws of physics understand! An AI needn't operate directly at that level to make intuitive leaps beyond the capability of any human; and needn't operate by trial and error, precisely, (although, notice, we don't know the internal structure of generating an insight in human brains; for all we know it involves subconsciously trying a hundred different paths and discarding most of them) to generate stuff that's very different from the code it starts with.

[-]JGWeissman14y40

I respectfully suggest that you are not thinking Weirdly enough.

Thinking Weirdly has nothing to do with it. I expect an AI to not use programming techniques it doesn't expect to be able to use effectively, and I expect the AI's expectations to use techniques effectively to be accurate. So, given that an AI is using a technique, even if it is Weird, I expect the AI to be effective using it. If you have an argument that a certain technique, like random guessing and checking, has insurmountable problems, then you have an argument that an AI will not use that technique. Given that the AI is using a Weird technique, I expect the AI to be advanced enough to cope with, and benefit from, the Weirdness.

Notice that the evolved circuit still has structure that the laws of physics understand!

The laws of physics can only understand the structure in a poetic sense. When I say that an AI understands the structure of its code, I mean that it has a map of the code, organized into logical components with information (not required to actually run the program) about high level properties components have and how other components rely on those properties, and this information is available and useful to modifying the code in good ways.

An AI needn't operate directly at that level to make intuitive leaps beyond the capability of any human; and needn't operate by trial and error, precisely, (although, notice, we don't know the internal structure of generating an insight in human brains; for all we know it involves subconsciously trying a hundred different paths and discarding most of them) to generate stuff that's very different from the code it starts with.

It doesn't matter if the AI makes leaps beyond the capability of any human as long as it doesn't make leaps beyond its own capability. You seem much more eager to apply Weirdness to difficulty of the problem than to capability of solving the problem. It doesn't matter that understanding the AI's Weird code is too hard for humans, because it's not too hard for the Weird AI. The AI may "generate stuff that's very different from the code it starts with", but it won't generate anything so different the AI can't verify it is a good change.

[-]timtyler14y00

If you don't want to face that problem, surely you can just constrain the machine to write readable code.

A complicated refeactoring that is difficult to show whether it does the same thing could be discarded. What is needed most is a path forward, not the ability to traverse any possible path that leads forwards.

It seems likely that there will be a tradeoff between progress speed and safety, with "looking for a proof" being the slowest approach. Such a technique seems relatively unlikely to be effective if applied during a race.

[-]RolfAndreassen14y00

What's the use of an AI that writes code I could have written myself? If that's the case, cut out the middleman and just write the damn stuff! I specifically want an AI that's smarter than me, otherwise I have no use for it.

It seems likely that there will be a tradeoff between progress speed and safety, with "looking for a proof" being the slowest approach. Such a technique seems relatively unlikely to be effective if applied during a race.

Yes, this is true. That's a problem. We don't want the first AI that FOOMs effectively to win. We want a provably Friendly AI to win. If we demonstrate that proving Friendliness has constraints that don't apply generally, we still cannot abandon the constraints! That defeats the entire purpose!

[-]thomblake14y00

What's the use of an AI that writes code I could have written myself?

I'm confused. Isn't this sort of thing what most machines are for? You might just as well ask, "What's the use of a clothes-washing machine, when I could've just washed all of the clothes by hand?"

[-]RolfAndreassen14y00

This is a good point, but there are two objections. First, the washing machine lets you substitute one kind of effort for another and thus use comparative advantage. I can earn enough money for a washing machine in much less time than the total time saved over the lifetime of the washing machine. With an AI, code is code; there's no comparative advantage in writing code to instead of writing code.

Second, in referring to "code I could have written myself", I was referring to qualitative advantages rather than time saved. To make the washing-machine analogy work with this, postulate a washing machine that doesn't actually save you any time - maybe you have to sit about cranking a driving shaft for two hours - but produces much cleaner clothes, or a smaller chance of ripping, or some other qualitative measure.

I note that automatic code generators we have already, in some cases built into the language features, like templates in C++. They're occasionally useful but not likely to FOOM on us.

[-]thomblake14y20

First, the washing machine lets you substitute one kind of effort for another and thus use comparative advantage.

'Comparative advantage' needs exchanging one kind of effort for another because it's a law regarding trade amongst humans. What you're looking for is mechanical advantage, which often involves trading work for more work.

With an AI, code is code; there's no [mechanical] advantage in writing code to instead of writing code.

No. If you spend 1 day writing code for an AI and then it writes all your code from now on, you've saved an arbitrarily large amount of time writing code.

Second, in referring to "code I could have written myself", I was referring to qualitative advantages rather than time saved.

Well your question was "What's the use of an AI...?", to which I could legitimately bring up all sorts of advantages you hadn't been referring to. If you had said, "What's the use of a chicken when it can't even dance?" I could respond "I could eat it" and presumably that would be an answer to your question.

To make the washing-machine analogy work with this, postulate a washing machine that doesn't actually save you any time - maybe you have to sit about cranking a driving shaft for two hours - but produces much cleaner clothes, or a smaller chance of ripping, or some other qualitative measure.

Your washing machine does still sound preferable to hand-washing, so I'm not sure what the point was.

I'm terribly confused as to what your point was, or why you think a code-writing machine would be useless. I want one!

[-]RolfAndreassen14y00

Ok, yes, an AI that saves me the effort of writing code would be useful, fair enough. I think, however, that in the context of writing a FOOMing Friendly AI, code that I could have written is not going to be sufficient.

[-]thomblake14y00

We're in agreement.

[-]timtyler14y00

We don't want the first AI that FOOMs effectively to win. We want a provably Friendly AI to win.

This seems as though it is framing the problem incorrectly to me. Today's self-improving systems are corporations. They are a mix of human and machine components. Nobody proves anything about their self-improvement trajectories - but that doesn't necessarily mean that they are destined to go off the rails. The idea that growth will be so explosive that it can't be dynamically steered neglects the possibility of throttles.

A "provably-Friendly AI" doesn't look very likely to win - so due attention should be give to all the other possibilities with the potential to produce a positive outcome.

[-][anonymous]14y00

Given a program, finding another program that has the same input to output map is much easier than given two programs, determining if they have the same input to output map.

This is interesting. Do you have any sources that discuss why this is so?

The second problem is equivalent to the halting problem. The first problem isn't.

[This comment is no longer endorsed by its author]Reply

[-]billswift14y20

You might want to edit your second paragraph. The grammar seems to be a bit garbled, to the extent that I can't figure out what you are trying to say.

[-]JGWeissman14y00

Yes, there were missing words. Fixed.

[-]roystgnr14y40

We don't need to compare two arbitrary Turing machines to verify an optimization, but we don't even need to compare two arbitrary Turing machines with an enormous-but-finite arbitrary input space; we just need to compare unoptimized-version-of-code to optimized-version-of-code. That's a much narrower task. Compilers arguably have a harder task, of comparing human-readable-code with machine-executable-code, and there have been provably correct compilers created even for inelegant languages like C.

I'm not sure about the entire direction of "prove that optimization won't change our output", however. A practical AI is likely to be limited by computational resources, right? So if we want to get answers out of it at all, there's a good chance that we're going to have to use approximate algorithms such as iterative methods that never reach exact convergence. But then what should a better-optimized AI do with it's effectively increased computational resources? Terminate the functions at the same, equally inaccurate point, then while(no_new_input()) twiddle_thumbs()? Or use more accurate approximations and/or more iterations to get to a better, but different output?

(edited to fix markup; thanks army1987)

[-]A1987dM14y20

Underscores are for italics in Markdown. To get “no_new_input”, write no\_new\_input.

[-]RolfAndreassen14y00

we just need to compare unoptimized-version-of-code to optimized-version-of-code. That's a much narrower task.

I don't think that's true. The optimised and unoptimised code can be considered as the start of the tape of a fixed Turing machine, or as the operating instructions of different Turing machines. It is important not to conceptualise Turing machines too much in terms of our experience with desktop computers, which are only one particular implementation of the general concept. A computer has hardware, code, and input; a Turing machine just has input and rules. There is an exact equivalence between "What is the output of Turing machine X with input A+B", and "What is the output of Turing machine X' with input B", where A has been absorbed into the machine.

Terminate the functions at the same, equally inaccurate point, then while(nonewinput()) twiddle_thumbs()? Or use more accurate approximations and/or more iterations to get to a better, but different output?

I was speaking of optimisations for speed; presumably you always optimise a bottleneck, so you terminate at the same point and then go on to deal with some other input that was waiting for the response from this particular calculation.

[-]roystgnr14y00

I never claimed that "optimized code" and "unoptimized code" can't be considered as two different generalized Turing machines; I claimed that two arbitrary different generalized Turing machines might not always be possible outputs of an optimization process. You're correctly pointing out that "all before-and-after-optimization code pairs are equivalent to pairs of Turing machines", but that is not the opposite of "not all pairs of Turing machines are before-and-after-optimization codes". And once you start looking at strict subsets all sorts of impossibilities become possibilities; I can solve the halting problem if you let me pick a sufficiently restricted subset of Turing machines as admissible inputs!

I was also speaking of optimizations for speed, to ask the question: why are you bothering to optimize for speed? The answer isn't "I want the code to be able to produce the same output in 1/Nth of the time so my computer can idle for (N-1)/Nths of the time"; what we're really looking to do is let the code produce different, better output. So "will my optimized version produce the exact same output for the exact same input" is only half of the problem; it still isn't a proof of recursive stability in a realistic use case where after each improvement we're also changing the input.

[-]RolfAndreassen14y00

You're correctly pointing out that "all before-and-after-optimization code pairs are equivalent to pairs of Turing machines", but that is not the opposite of "not all pairs of Turing machines are before-and-after-optimization codes".

A useful distinction. Thank you.

So "will my optimized version produce the exact same output for the exact same input" is only half of the problem; it still isn't a proof of recursive stability in a realistic use case where after each improvement we're also changing the input.

Suppose we have an AI that does two things: Calculate ballistics trajectories for its army of killer robots, and coherently extrapolate human volition to guide its overall goals. If it optimises the ballistics calculation, it can spend more time thinking about the CEV; this will produce a different result (unless it was already at the point of reflective stability), but in this case that's a good thing. However, the optimised ballistics calculation had better be yielding the same results or it will start losing the war. So I distinguish between two outputs: The output of the specific function being optimised must be the same. The output of the AI as a whole can differ.

[-]DuncanS14y00

I think it's right to say that it's not always possible to show that two programs are equivalent.

However, it is very often possible to do so for real functions - in fact far more often than not. So real options for optimization don't appear to be much constrained by this.

[-]Anatoly_Vorobey14y00

Suppose the Universe is finite in volume and amount of matter. It follows that all your algorithms are actually finite state machines, not Turing machines. How does that affect your argument? Do you believe that the choice between the Universe being infinite vs it having at most 10^117 atoms directly bears on the possibility of constructing an AGI within the next 1000 years?

[-]RolfAndreassen14y00

No, but that's not necessarily the correct upper bound to use. The difference between infinity and, say, having only a few dozen terabytes of storage space might be important.

[-]XiXiDu14y00

A Friendly AI, optimising itself, must ensure that it remains Friendly after the modification;

Isn't this also true for unfriendly AI? Any AI has to ensure that improved versions of itself are friendly with respect to its initial values. So for each modification, or successor, it has to find a proof that it will not only respect its values but that it will do so in a way that more effectively maximizes expected utility.

[-]RolfAndreassen14y30

Ah no. Friendliness is a special category of AIs, and as such is more restrictive: No AI can be Friendly whose output changes under optimisation, but an Unfriendly AI is still Unfriendly if its output changes.

[-]timtyler14y00

Not really. For example, you could have a "sloppy" superintelligence that traded short term gain over the future of the universe by giving it a short planning horizon.

[-]TheOtherDave14y00

The phrase "has to" is a little confusing here. Sure, any AI that doesn't reliably preserve its value structure under self-modification risks destroying value when it self-modifies. But something can be an AI without preserving its value structure, just like we can be NIs without preserving our value structures.

[-]DanielLC14y00

It seems to follow that you cannot build a fully general proper-optimisation detector.

So don't make it fully general. It's not like that means it's some near-useless thing that can only run in certain special cases. It's fine if it works most of the time.

[-]RolfAndreassen14y00

Only if you can reliably tell when it didn't work. :)

[-]DanielLC14y00

Make it so that if they give the same results, it can usually prove it. If they don't give the same results, it will never prove it. As such, it will work fine.

[-]RolfAndreassen14y00

Well, ok, just terminate the search for proof after some defined cutoff time, ok. But that may drastically limit the kinds of proof you are able to find.

[-]DanielLC14y00

It may. It may not. I don't know. It may be impossible to optimize a system that much. That's not my field.

Moderation Log