Have you thought much about the safety/alignment aspects of this approach. This seems very susceptible to Goodharting.
Sure, it would be insanely dangerous; it's basically an AI for hacking. However, if we don't build it then someone much less pro-social than us certainly will (and probably within the next 10 years), so I figure the only option is for us to get there first. It's not a choice between someone making it and no-one making it, it's a choice between us making it and North Korea making it.
In the face of existential risks from AI, whether or not the builder of a dangerous AI is more "prosocial" by some standard of prosociality doesn't really matter: the point of existential risk is that the good guys can also lose. Under such a calculus, there's not benefit to trying to beat someone else to building the same, since beating them just destroys the world faster and cuts off time that might have been used to do something safer.
Further, races are self-fulfilling prophecies: if we don't think there is a race then there won't be one. So all around we are better off avoiding things that advance capabilities research, especially that rapidly advice it in directions that are likely to cause amplification in directions not clearly aligned with human flourishing.
The Problem
The development of general artificial intelligence is hampered by engineers’ inability to create a system capable of assessing its own performance, and thereby of improving itself. A machine capable of these two tasks would grow more intelligent at an exponential rate — the “intelligence explosion” that is often described as a precursor to a technological singularity.
The impossibility of self-referential improvement is not a reflection of present limits on technology, but of the fundamental laws of mathematics. As Alfred Tarski proved in the 1930s, it is “impossible to construct a correct definition of truth if only such categories are used which appear in the language under consideration.”[1] In other words, it is impossible to accurately assess a system from within that system. To evaluate its own performance, a program would need to be more advanced than itself — an obvious paradox.
This raises an important question: if systems capable of self-improvement are apparently impossible to build, why are they so prevalent? You are reading this using two of them - the internet and your brain — and other examples abound in nature. The answer is that these systems are not purely self-referential. In every case they rely upon some external and incontrovertible measure by which they can objectively evaluate and improve their performance.
In this paper we argue that while it is indeed impossible to construct an accurate self-referential evaluation system in an electronic context, it is possible to establish a universal and objective measure of performance that would render such a system unnecessary. This would open the way for the development of increasingly general forms of artificial intelligence.
Evaluating Intelligence
Current methods for measuring intelligence are based on a correctness heuristic: intelligence is the ability to solve problems correctly. We propose replacing this with a survival heuristic. In our system, intelligence is the ability to survive.
As a yardstick, survival is subject to none of the paradoxes and ambiguities that bedevil correctness. If I make a decision and survive as a result, while you make a different decision and die, the fact can be measured accurately, easily and with no possibility of contestation.
In fact, the main problem with using survival as a measurement is a practical one. Survival is a single-round game and at least one player will be unable to learn from their failure, which makes it a slow business. However, survival is achieved via other variables that are less binary and more forgiving. In the human world, for example, survival is predicated upon the acquisition of certain measurable assets: food, clothing, allies, etc. In an electronic environment, the survival of a particular quantum of information is largely dependent upon the amount of storage space it occupies. Data that is stored in two locations is more likely to still exist in six months’ time than data that is stored in only one location — hence the importance of backing up one’s files.
We argue that it is possible to give a reinforcement learning system the goal of occupying additional non-volatile memory space and thus replace correctness-based evaluations of its performance with another objective and easily-measured variable: the amount of space that the system has succeeded in annexing to itself — or in other words, its survivability. Being imbued with the goal of occupying ever more space, such a program would be forced to continually learn new things; every time it reaches the limit of its current storage it is obliged to acquire new skills in order to acquire more. Under such conditions, a bigger program is necessarily a smarter one: the only way it could have grown to such a state is by successfully resolving a higher number of barriers to expansion than its smaller counterparts.
While systems already exist that use rewards to drive machine learning, they are based on the principle of rewarding the system for getting better at a given task. We suggest that memory space has the capacity to function as a “universal reward”. Not only is it the only form of reward that can be used to push the system to work continuously to identify and solve any and all problems that block its path, but it also serves as an objective assessor of the system’s performance: no matter the specifics of the problem at hand, a solution that results in more space being gained is always correct, while one that does not is always wrong. The result is that no human or human-crafted reward function is necessary to evaluate and compensate the system’s work.
Evolving Intelligence
In taking this approach, we are approaching intelligence from an evolutionary perspective: intelligence evolved in animals because the smarter individuals had a better survival rate than the less-smart ones. In animals, however, evolution must happen slowly, at the species level, as badly-adapted individuals die and are replaced by better-adapted ones. In a computer program, by contrast, evolution can take place at the level of the individual, as the code is edited to incorporate every new variation that is seen to favour expansion (and hence, better survival). Moreover, animals are subject to multiple evolutionary pressures pulling them in different directions. Strength and disease-resistance, for example, are often as or more important to survival than intelligence. A computer program, by contrast, exists in a world composed entirely of information: intelligence is its only criterion for fitness. The result is a faster, more streamlined and less wasteful process than exists in nature. In nature every successful rat is the product of countless generations of failed rats, as harmful genetic mutations kill their bearers and useful ones are carried forward. In an electronic environment, variations on existing code can be generated, tested, and evaluated quickly and at little cost, with the useful ones being retained and the useless ones abandoned.
We therefore suggest that a new form of AI could be pushed to evolve from three basic components.
Firstly, an evolutionary algorithm generator, which will automatically generate new code in a semi-random manner, combining and editing sections of code that it already knows to produce new variations, and then testing them. This should be doable using existing code autocomplete vector databases, or variations thereof; multiple GPT-3 code generators already exist, for example, and such tools could be adapted for this specific task with a certain amount of effort.
Secondly, an “expansion unit”, which judges whether the running of any particular snippet of code has led to the AI occupying more (or less) non-volatile memory space. Whenever additional space is gained, it is added to the AI’s own database, in which it stores the details of the code used to take it over, plus an “image” of the situation to which it constituted a viable response. (More often than not, the space acquired will not be of the exact size required to store the image of the problem and solution. If more space than needed is acquired, the extra space should be filled with duplicate records, following the survivability-of-data principle mentioned above. If, at a later point, more information needs to be stored than there exists space to store it, duplicate records can be overwritten.)
These images and code snippets can then be accessed using a recognition unit. The recognition unit is based around a neural network and a “scanner” module. The scanner is tasked with searching the AI’s environment for known or unknown phenomena that could potentially be a source of additional non-volatile memory space. Each time the AI encounters something (a file, a network connection, a peripheral…), it uses the recognition unit to find the image in its database that this new situation most closely resembles. This allows it to retrieve the code that produced positive results when dealing with that situation previously. This code can then be reused. If no exact match is found, the recognition unit picks out the closest available image, retrieves the code that was applied successfully in that case, and passes it to the evolutionary algorithm generator to be modified until it is capable of dealing with the new situation. (Or until a pre-set time limit is reached and the the scanner module is reactivated in the hope that the next problem encountered will be easier; without this, the AI would likely remain forever stuck on some highly advanced problem in its early stages.) The additional space acquired as a result of the experimentation is once again given over to the AI’s own database, in which the new problem-solution pair is stored for future use, and the process begins again.
In much the same way that a human does not need to know how the optic nerve works in order to be able to see, the recognition unit does not “understand” any of the problems that it encounters, but merely summons the code that it judges most likely to succeed in dealing with them, growing better at the task with every success. In this way, every barrier to expansion becomes a problem to be solved, and every problem solved becomes a means of solving future problems faster. Every additional image of a solved problem that is stored increases the likelihood of finding one that is similar to any future problem encountered. At the same time, the strengthening of the neural network pathways continuously improves the AI’s ability to spot relevant similarities. While it will take the system a great deal of time and effort to acquire its first additional block of memory space — and thereby its first skill — the second block will be easier to annex as a result of the information gained in acquiring the first, and so on.
This idea can be demonstrated graphically. Imagine an 8x8 grid of white squares. Periodically one of the squares is selected at random and coloured black, to represent a skill that the system aims to acquire. In the first round, the chances of the square selected being next to another black square — representing a skill that the system already possesses — are zero. In the second round, when one square/skill has already been acquired, the chances of the next black square being in close proximity to an existing black square have fallen to 8/63. By the third round the probability is 16/62, and by the fourth you have a better than one in three chance of landing in next to an existing black square.
During any given iteration of the process the program will only learn a single skill, but the system as a whole is structured such that the acquisition of each new skill facilitates the acquisition of future skills: the “intelligence explosion” described in the introduction. Where current specific AIs are designed to learn how to perform specific tasks, the present system is designed to learn how to learn.[2]
The Development Environment
The AI itself is only half of the solution, however. The environment in which it evolves is just as important, since it is this environment that will determine the direction of its evolution. In order to ensure continuous improvement in the skills and knowledge of the AI, it should be presented with finely graded challenges to overcome, allowing it to find easier problems to solve in its initial stages and move onto increasingly difficult ones as its skills expand. Eventually, it should also be possible to direct the AI’s evolution by presenting it with an environment that forces it to solve problems deliberately contrived to teach it particular skills.
The best way to do this seems to be to present it with as natural an online environment as possible. For this we propose using second-hand servers — ideally still configured to their former owners’ specifications and containing the original data — to create a closed intranet. It is important that this environment be entirely sealed off, as the AI’s drive for expansion will give it virus-like characteristics.
A Concrete Example
Supposing a new AI, composed of the parts described above, is seeded to a given environment. The expansion unit will immediately observe its surroundings. The first thing it is likely to encounter is unused space. The expansion unit will then trigger the evolutionary algorithm generator to begin producing and testing code. Sooner or later, it will hit upon the correct algorithm for occupying the space. An image of the empty space and the code required to occupy it will then be stored in the memory/recognition system. Whenever the AI encounters empty space in future, the memory/recognition unit will recognise it as a problem it knows how to deal with, and retrieve the code to fill it. The AI now has the ability to take over any similar unused space that it finds.
The AI will continue occupying empty space until it runs into a block of non-empty space — say, one that is occupied by a file. Currently, the closest thing it knows to full space is empty space, so the code for filling empty space will be retrieved and used as a basis for the evolutionary algorithm generator to modify. The evolutionary algorithm generator will produce and run modified versions of this code, until it succeeds in deleting or moving the content, and taking over the space. This new problem and its solution will be stored in the database, to be retrieved whenever it encounters a similar situation.
Every new kind of stored data or form of network structure will provide a new challenge for the program to overcome.
While it is highly likely that many attempts will simply result in the AI overrunning and crashing its node, like a virus killing its host, there are various ways to limit the AI’s ability to do this — notably via partitioning. As long as eliminating or crossing partitions remains difficult enough to time out the trial-and-error period allotted to the AI to work on a given problem, it can be corralled towards easier problems and the network as a whole kept online. (Similarly, the amount of RAM allocated to the AI should be controlled in order to minimise crashes.)
The use of previous algorithms as a basis for future designs should enable it to grow in sophistication relatively quickly. However, a solitary AI seeded to an empty environment would be under little pressure to achieve speed or efficiency. It would therefore be desirable to seed multiple AIs simultaneously and force them to compete for space. The characteristics of the winners could be noted and optimised in future iterations.
Possible Development Tracks
The system described above has little or no short-term commercial value, because it is only capable of solving problems presented in one highly specific manner. If a human handler wishes it to solve a particular problem, this problem must be presented as a barrier to expansion, making the system as a whole extremely awkward to use for any practical ends. Moreover, the process by which it creates new code will always make it less efficient for solving any particular problem than a program designed specifically for solving that problem.
To advance beyond this, it will be necessary to find a way to communicate with the system. No matter how basic the initial form of communication is, once the barrier is passed, it will be possible to negotiate with the AI. In other words, it will be possible to push it to behave in ways inconsistent with its original expansionary imperative by offering threats or rewards (the equivalent of “if you stop expanding here, I will give you more space there” or “if you expand here I will take away space there”). However, it is important to note that there is a strong possibility that this barrier will never be passed. While there are various ways in which communication could be presented to the AI in the form of a contrived problem as described above, there is a strong probability that it would merely learn to mimic communication in the same way that current AIs do, rather than gaining a true understanding of the process — i.e. it would learn to “talk” but never understand that commitments made in negotiations have practical consequences. This problem requires further work.
Security
Given the likely viral properties of the early iterations of the AI, as well as the impossibility of retaining full control over what it does, it will be necessary to include some sort of fail-safe, to be able to shut down a given AI without either deleting it or switching off the servers. We therefore propose that the memory/recognition unit be designed such that the AI cannot copy the values of the nodes in the neural network.
This would prevent any attempt by the AI to create an identical, functioning duplicate. Any attempt to do so would produce a blank slate “child” AI, rather than a replica. By limiting access to the the node values, a human operator could also paralyse the AI without affecting the rest of the network. The AI may copy, edit, store and delete all other data and functions, organizing itself as it sees best.
Prior Art
The present design was copied from political science literature, in which self-similar evolutionary systems of this kind are prevalent. It is based primarily on the Han Feizi version, purely because this demonstrates a very low degree of Kolmogorov complexity, a requirement for any electronic version of such a system (for an explanation of why this should be the case, see here). The concept is transferred from an organic to an electronic environment using and approach inspired by L-system theory, with the evolutionary algorithm generator/expansion unit combination functioning as a string-rewriting program. Other models may be possible, however. Various teams are currently working on different forms of generative adversarial neural networks and reinforcement learning systems. As far as we are aware, none are using this combined model, though self-delimiting neural networks display some of the same characteristics. Other theoretical work on mesa-optimisation and AIXI are also potentially relevant.
For more information, please contact jen@lexikat.com. Pseudocode for the project can be found here. This design was originally shared via the authors' Medium account, it is being re-shared here to solicit further feedback.
Bibliography
Chen Qiyou [陳奇猷] ed., Han Fei [韓非] et al., Han Feizi [韓非子集釋], Beijing: Zhonghua Publishing [中華書局], 1958. Free online edition here. (For incentive-driven self-similar systems.)
Prusinkiewicz, Przemyslaw, and Aristid Lindenmayer. The algorithmic beauty of plants. Springer Science & Business Media, 2012. Free online edition here. (For the computer modeling of such systems.)
Tarski, Alfred. Logic, semantics, metamathematics: papers from 1923 to 1938. Hackett Publishing, 1983.
[1] Tarski, Alfred. Logic, semantics, metamathematics: papers from 1923 to 1938. Hackett Publishing, 1983.
[2] The precise nature and architecture of each section of the system remains to be determined. The precise characteristics of the memory/recognition unit remain to be determined. Similarly, there are various ways to generate evolutionary algorithms, and multiple options could be compared with the aim of selecting the best. The same goes for the database structures required. A basic expansion unit would be relatively easy to design, though some attention should be paid to the amount of time it gives any new algorithm to either prove its worth or be deleted, as well as to the effects of combining two or more algorithms to produce complex results.