5. The advantage of not being open-ended

Summary of entire Series:  An alternative approach to designing Friendly Artificial Intelligence computer systems.

Summary of this Article: When setting a computer a task, there are advantages to defining the task in such a way that a finite budget of some resource (such as time) is allocated to the task, and to giving the task completion criteria such that it will always be possible for the computer to determine if the solution it came up within that budget met the criteria.

  1. Optimum number of single points of failure
  2. Don't put all your eggs in one basket
  3. Defect or Cooperate
  4. Environments for killing AIs
  5. The advantage of not being open-ended
  6. Trustworthy Computing
  7. Metamorphosis
  8. Believable Promises

Links to the parts of this article

  • Finite Resources
  • Finite CPU cycles
  • Chess versus River Crossing
  • Open ended problems
  • Limitations

Finite Resources

In 1856 a Presbyterian sermon, entitled "The Dull Axe", raised the question of whether a wood-cutter is better off trying to cut down a tree with a dull axe, or to first spend some time sharpening it.

The answer depends on the scope of the task. For each level of sharpness you can give an axe, how much time per tree does it save? And how much time does the sharpening take? If you know in advance that you have 10 trees to cut down, then you could, in theory, keep sharpening until you just reach the point where an additional minute spent sharpening would only just save 6 more seconds per tree.

Similarly, if you know that you only have 10 hours to split between axe sharpening and tree felling, then you could split those hours between the two activities in a way which would maximise the number of trees you'd end up felling in that time.

And, in theory, if sharpening a particular axe were a time consuming process (say, for example, requiring a visit to a specialist axe sharpener), and you only had one tree to chop down, the optimum solution might be to not spend any time increasing the axe's sharpness.

Unlike the actual chopping of the tree, once you have the numbers, using them to decide how to split the available time is a simple problem, if the scope is kept finite and well defined.

Finite CPU cycles

Suppose I have a computer program whose input is a stream of numbers, and whose output is the square of those numbers. And that the way the program currently implements this is inefficient (by, for example using two nested for loops, and incrementing a variable "answer" by 1 each time).

If I'm told in advance certain things, such as how many numbers (and the order of magnitude of their size) will be coming down the stream, before the program gets deleted forever, and roughly how long it will take to edit and recompile an improved version of the program, then I can make a decision as to how much time it is worth investing at the start to improve the program's speed per number.

If there will be millions of large numbers, such that the program in its current state would take hours to run, then it would be worth investing 10 minutes changing it.

If there will only be 5 numbers, each less than 100, that the program in its current state can handle within seconds, then it wouldn't be worth changing it (at least, not if I'm given a specific aim and scope. Aesthetics would be a different matter).

As before, with the axe, once you have the numbers, making the decision on when to stop sharpening and start chopping is simple, if the scope is is kept finite and well defined.

Even if spending additional resources sharpening the axe always resulted in at least some improvements, beyond when it reached atom-width sharpness, or even if a number squaring algorithm could always be improved with additional coder-hours, there would always be a point of diminishing returns, beyond which the gains in sharpness would be outweighed by the reduced time available for the actual activity.

The finite resource used does not have to be time or volume of output. It could be expense. In the case of a program-improving program being used to update the number squarer, rather than a human coder, the finite resource could be "total CPU cycles expended". Just as long as it is a single type of resource that constrains both activities ('sharpening' and 'chopping') and a finite pool of it gets split between them, then the optimum split is determined by the scope of the task, in a simple and easy to analyse fashion.

Chess versus River Crossing

If a bunch of executives on a team building course are presented with the River Crossing task (cross a 'river' marked out with rope, as a group, stepping only on 'islands' of cardboard, with some members blindfolded), how they carry it out might depend upon what they think the course organiser intended the task to achieve. Is the organiser looking for communication skills, leadership, out-of-the-box thinking, or something else? One bunch might spend an hour discussing solutions, and only two minutes crossing the river. A different bunch might turn some of the bits of cardboard into a currency, hold an auction, and pay some members to carry others.

If a chess puzzle aficionado is given a chess position, and asked to find a way to mate in as few moves as possible, they don't need to ask "Why would you want to do that?". They know they shouldn't ask "Can I use a knife to intimidate the other player into resigning?" because the context implies that changing the problem is cheating - it outside the scope of what's wanted. It isn't interesting. The interesting thing to the aficionado is to solve the presented problem within the rules of chess.

Open ended problems

Suppose you have computer program that's capable of (among other things) understanding and improving computer programs, and that your program is currently corrigible. If you give it an open ended task, such as "Improve this chess program as much as you can.", then there's nothing to stop it deciding that the best approach is to first improve its own program-improving capabilities as much as possible, before applying them to the chess program. Working out, in advance of giving the task, how much time it will spend self-improving (and what level of risk might result from that) is a difficult to analyse problem. Because it can revise its decision on how to spend its resources as it goes along, and past a certain point along the curve, its decision making process will surpass your ability to anticipate what it will decide.

Suppose, on the other hand, you say "This project has three phases. In phase III you'll use your program-improving capabilities on a chess program. In phase II you'll use your program-improving capabilities upon yourself. The combined budget for phases II and III is 10,000,000 CPU cycles. In phase I, you will parse this task definition, write a proposal for how you'll divide that budget between II and III, with the aim of maximising (within the scope of the presented task) the expected chess-puzzle solving capabilities of the chess program at the end of the budget, and then await approval of your proposal. Your budget for phase I is 10,000 CPU cycles. All standard constraints (predefined in LINK) apply. You may start now."

By presenting the task in a non-open-ended fashion, it falls under the same simple mathematics that the wood-cutter faced. It is irrelevant what terminology the program-improving-program uses to categorise which activities fall into self-improvement versus directly working on the program to solve chess puzzles. There is an optimum balance between the two activities, for any particular size of budget, and that isn't changed by terminology.

By making the decision simple, you also make it easier to examine it (or have several third party AIs explain it to you), while your program-improving-program is paused awaiting approval of its proposal. This has a property that is important to preserving corrigibility. A program that isn't as smart as the one involved may have a reasonable chance of still understanding quick simple decisions taken by the smarter one, if the smarter one starts out corrigible and not trying to deceive or otherwise hide its own predictions about how much it will manage to improve its own abilities by spending a particular finite number of CPU cycles upon the improvement process. It lets you monitor and manage risk with greater reliability than you could in the case of trying to monitor a program capable of self-improvement that's faced with an open-ended task and whose decision on how much to self-improve is far more complex and difficult for a less smart monitoring program to examine or understand. In particular, it lets you keep track of any shift in how it is defining "CPU cycles" (or whatever the limited resource is), or attempts to apply additional unbudgeted CPU cycles to the task (such as out-sourcing part of the problem to another computer, then defining what it gets back as data).

When you are trying to be non-open-ended, you don't tell it to just maximise something. You set it the task in the way that chess puzzles are set - of finding a sufficiently good solution for a particular set of rules or constraints. You frame the task so that it has a finite finishing condition or way or determining success or failure.

Limitations

This approach isn't a general purpose solution to the problem of safe AI development. For example, if you want a program to run the planet Earth on humanity's behalf, on an ongoing (and increasingly beneficial) fashion, that's a pretty open-ended task. It will need, at some point, to self-improve beyond the control of any other program or individual (if only to avoid the risk of being corrupted by a malign AI).

But, where a function in an computing ecosystem could be managed by setting only non-open-ended tasks, this approach might have advantage. In the next part, I look at the implications of creating temporary non-open-ended task programs for trust and cooperation between multiple AI programs.

The next article in this series is: Trustworthy Computing

New Comment
3 comments, sorted by Click to highlight new comments since:

The trouble is that if the AI finds a solution we didn't expect, it's not sufficient for the resources to be limited - we can still get into trouble if the goal is open-ended. For example, if there is no upper bound on how good a chess program can be, then the AI wants to somehow control lots of resources to improve the chess program. It is running a search process for ways around the resource limitations (like building a successor and running it on another computer, or convincing you to change your mind, or exploiting a bug in the code), and we're just hoping that search fails.

The real trick, in other words, is not limiting the resources of the AI, it's making the AI's goals only need limited resources to be fulfilled.

Other people have written some relevant blog posts about this, so I'll provide links:

Reduced impact AI: no back channels

Summoning the Least Powerful Genie

The novel Soul Bound contains an example of resource-capped programs with narrowly defined scopes being jointly defined and funded as a means of cooperation between AIs with different-but-overlapping priorities.