Use and misuse of models: case study
Some time ago, I discovered a post comparing basic income and basic job ideas. This sought to analyse the costs of paying everyone a guaranteed income versus providing them with a basic job with that income. The author spelt out his assumptions and put together a two models with a few components (including some whose values were drawn from various probability distributions). Then he ran a Monte Carlo simulation to get a distribution of costs for either policy.
Normally I should be very much in favour of this approach. It spells out the assumptions, it uses models, it decomposes the problem, it has stochastic uncertainty... Everything seems ideal. To top it off, the author concluded with a challenge aiming at improving reasoning around this subject:
How to Disagree: Write Some Code
This is a common theme in my writing. If you are reading my blog you are likely to be a coder. So shut the fuck up and write some fucking code. (Of course, once the code is written, please post it in the comments or on github.)I've laid out my reasoning in clear, straightforward, and executable form. Here it is again. My conclusions are simply the logical result of my assumptions plus basic math  if I'm wrong, either Python is computing the wrong answer, I got really unlucky in all 32,768 simulation runs, or you one of my assumptions is wrong.
My assumption being wrong is the most likely possibility. Luckily, this is a problem that is solvable via code.
And yet... I found something very unsatisfying. And it took me some time to figure out why. It's not that these models are helpful, or that they're misleading. It's that they're both simultaneously.
To explain, consider the result of the Monte Carlo simulations. Here are the outputs (I added the red lines; we'll get to them soon):
The author concluded from these outputs that a basic job was much more efficient  less costly  than a basic income (roughly 1 trillion cost versus 3.4 trillion US dollars). He changed a few assumptions to test whether the result held up:
For example, maybe I'm overestimating the work disincentive for Basic Income and grossly underestimating the administrative overhead of the Basic Job. Lets assume both of these are true. Then what?
The author then found similar results, with some slight shifting of the probability masses.
The problem: what really determined the result
So what's wrong with this approach? It turns out that most of the variables in the models have little explanatory power. For the top red line, I just multiplied the US population by the basic income. The curve is slightly above it, because it includes such things as administrative costs. The basic job situation was slightly more complicated, as it includes a disabled population that gets the basic income without working, and a estimate for the added value that the jobs would provide. So the bottom red line is (disabled population)x(basic income) + (unemployed population)x(basic income)  (unemployed population)x(median added value of jobs). The distribution is wider than for basic income, as the added value of the jobs is a stochastic variable.
But, anyway, the contribution of the other variables were very minor. So the reduced cost of basic jobs versus basic income is essentially a consequence of the trivial fact that it's more expensive to pay everyone an income, than to only pay some people and then put them to work at something of nonzero value.
Trees and forests
So were the complicated extra variables and Monte Carlo runs for nothing? Not completely  they showed that the extra variables were indeed of little importance, and unlikely to change the results much. But nevertheless, the whole approach has one big, glaring flaw: it does not account for the extra value for individuals of having a basic income versus a basic job.
And the challenge  "write some fucking code"  obscures this. The forest of extra variables and the thousands of runs hides the fact that there is a fundamental assumption missing. And pointing this out is enough to change the result, without even needing to write code. Note this doesn't mean the result is wrong: some might even argue that people are better off with a job than with the income (builds pride in one's work, etc...). But that needs to be addressed.
So Chris Stucchio's careful work does show one result  most reasonable assumptions do not change the fact that basic income is more expensive than basic job. And to disagree with that, you do indeed need to write some fucking code. But the stronger result  that basic job is better than basic income  is not established by this post. A model can be well designed, thorough, filled with good uncertainties, and still miss the mark. You don't always have to enter into the weeds of the model's assumptions in order to criticise it.
Yet another safe oracle AI proposal
Previously I posted a proposal for a safe selfimproving limited oracle AI but I've fleshed out the idea a bit more now.
Disclaimer: don't try this at home. I don't see any catastrophic flaws in this but that doesn't mean that none exist.
This framework is meant to safely create an AI that solves verifiable optimization problems; that is, problems whose answers can be checked efficiently. This set mainly consists of NPlike problems such as protein folding, automated proof search, writing hardware or software to specifications, etc.
This is NOT like many other oracle AI proposals that involve "boxing" an alreadycreated possibly unfriendly AI in a sandboxed environment. Instead, this framework is meant to grow a selfimproving seed AI safely.
Overview
 Have a bunch of sample optimization problems.
 Have some code that, given an optimization problem (stated in some standardized format), finds a good solution. This can be seeded by a humancreated program.
 When considering an improvement to program (2), allow the improvement if it makes it do better on average on the sample optimization problems without being significantly more complex (to prevent overfitting). That is, the fitness function would be something like (average performance  k * bits of optimizer program).
 Run (2) to optimize its own code using criterion (3). This can be done concurrently with human improvements to (2), also using criterion (3).
Definitions
First, let's say we're writing this all in Python. In real life we'd use a language like Lisp because we're doing a lot of treatment of code as data, but Python should be sufficient to demonstrate the basic ideas behind the system.
We have a function called steps_bounded_eval_function. This function takes 3 arguments: the source code of the function to call, the argument to the function, and the time limit (in steps). The function will eval the given source code and call the defined function with the given argument in a protected, sandboxed environment, with the given steps limit. It will return either: 1. None, if the program does not terminate within the steps limit. 2. A tuple (output, steps_taken): the program's output (as a string) and the steps the program took.
Examples:
steps_bounded_eval_function("""
def function(x):
return x + 5
""", 4, 1000)
evaluates to (9, 3), assuming that evaluating the function took 3 ticks, because function(4) = 9.
steps_bounded_eval_function("""
def function(x):
while True: # infinite loop
pass
""", 5, 1000
evaluates to None, because the defined function doesn't return in time. We can write steps_bounded_eval_function as a metacircular interpreter with a bit of extra logic to count how many steps the program uses.
Now I would like to introduce the notion of a problem. A problem consists of the following:

An answer scorer. The scorer should be the Python source code for a function. This function takes in an answer string and scores it, returning a number from 0 to 1. If an error is encountered in the function it is equivalent to returning 0.

A steps penalty rate, which should be a positive real number.
Let's consider a simple problem (subset sum):
{'answer_scorer': """
def function(answer):
nums = [4, 5, 3, 5, 6, 9]
# convert "1,2,3" to [1, 2, 3]
indexes = map(int, answer.split(','))
assert len(indexes) >= 1
sum = 0
for i in indexes:
sum += nums[i]
if sum == 0:
return 1
else:
return 0
""",
'steps_penalty_rate': 0.000001}
We can see that the scorer function returns 1 if and only if the answer specifies the indexes of numbers in the list nums that sum to 0 (for example, '0,1,3,4' because 4+536=0).
An optimizer is a program that is given a problem and attempts to solve the problem, returning an answer.
The score of an optimizer on a problem is equal to the score according to the answerscorer, minus the steps penalty rate multiplied by the number of steps used by the optimizer. That is, the optimizer is rewarded for returning a better answer in less time. We can define the following function to get the score of an optimizer (Python source code) for a given problem:
def problem_score(problem, optimizer_source):
# run the optimizer on the problem
result = steps_bounded_eval_function(
optimizer_source, problem, 1 / problem['steps_penalty_rate'])
if result == None: # used all available steps, or got an error
return 0.0
answer, steps_taken = result # optimizer returned a result in time
# get the score using the problem's answer_scorer
answer_score = eval_function(problem['answer_scorer'], answer)
assert 0 <= answer_score <= 1
# penalize for taking up time and make sure the result is nonnegative
return max(0.0, answer_score  problem['steps_penalty_rate'] * steps_taken)
A simple optimizer that detects subsetsum problems (by inspecting the source code) and uses a bruteforce method to solve it will get a problem_score close to 1 for small subsetsum problems. This optimizer would not do well on other problems; other techniques (such as evaluating the problem source code with different answers and choosing a good one) are needed to solve a variety of different optimization problems. Writing a good optimizer is very hard work, just like any seed AI.
Framework
The framework has 4 modules:

A set of training problems. These should cover a wide range of problems that we would like the AI to solve.

An optimizer, written in Python. This should be seeded with a very good humanwritten optimizer. This is deliberately unspecified by me because it's a really hard problem (as is any seed AI).

A scorer for optimizer source code defined as follows:
def optimizer_score(candidate_optimizer_source): training_problems = [...training problems here...] # a parameter of the system; more on this later complexity_penalty = 0.1 # total up the program's score on all training problems total_score = 0.0 for problem in training_problems: total_score += problem_score(candidate_optimizer_source, problem) # penalize for complexity, to prevent overfitting total_score = complexity_penalty * len(compress_binary(candidate_optimizer_source)) # return average score return max(0, total_score / len(training_problems))
This gives a candidate optimizer a score in the range [0, 1] based on both its average performance on the sample set and its inherent complexity. Presumably optimizers with a higher optimizer_score will do better on future optimization problems.

A selfoptimization thread. This thread continuously runs program 2 on a problem formed using 3's answer_scorer and an everdecreasing steps_penalty_rate. Whenever program 2 outputs source code (optimizer_source) that is better than the current source code for 2, the source code for 2 is replaced with this new value. Also, humans can make improvements to program 2 if it increases its score according to 3's answer. Source code:
# assume we have access to an optimizer_source variable (program 2) def self_optimization_thread(): start_steps_penalty_rate = 0.000001 steps_penalty_rate = start_steps_penalty_rate while True: # loop forever self_optimization_problem = { # just use program 3 to score the optimizer 'answer_scorer': """ def function(candidate_optimizer_source): ... put the source code from program 3's optimizer_score here """, 'steps_penalty_rate': steps_penalty_rate } # call the optimizer (program 2) to optimize itself, giving it limited time result = steps_bounded_eval_function( optimizer_source, self_optimization_problem, 1 / steps_penalty_rate) changed = False if result is not None: # optimizer returned in time candidate_optimizer = result[0] # 2 returned a possible replacement for itself if optimizer_score(candidate_optimizer) > optimizer_score(optimizer_source): # 2's replacement is better than 2 optimizer_source = candidate_optimizer steps_penalty_rate = start_steps_penalty_rate changed = True if not changed: # give the optimizer more time to optimize itself on the next iteration steps_penalty_rate *= 0.5
So, what does this framework get us?

An superoptimizer, program 2. We can run it on new optimization problems and it should do very well on them.

Selfimprovement. Program 4 will continuously use program 2 to improve itself. This improvement should make program 2 even better at bettering itself, in addition to doing better on other optimization problems. Also, the training set will guide human improvements to the optimizer.

Safety. I don't see why this setup has any significant probability of destroying the world. That doesn't mean we should disregard safety, but I think this is quite an accomplishment given how many other proposed AI designs would go catastrophically wrong if they recursively selfimproved.
I will now evaluate the system according to these 3 factors.
Optimization ability
Assume we have a program for 2 that has a very very high score according to optimizer_score (program 3). I think we can be assured that this optimizer will do very very well on completely new optimization problems. By a principle similar to Occam's Razor, a simple optimizer that performs well on a variety of different problems should do well on new problems. The complexity penalty is meant to prevent overfitting to the sample problems. If we didn't have the penalty, then the best optimizer would just return the bestknown humancreated solutions to all the sample optimization problems.
What's the right value for complexity_penalty? I'm not sure. Increasing it too much makes the optimizer overly simple and stupid; decreasing it too much causes overfitting. Perhaps the optimal value can be found by some pilot trials, testing optimizers against withheld problem sets. I'm not entirely sure that a good way of balancing complexity with performance exists; more research is needed here.
Assuming we've conquered overfitting, the optimizer should perform very well on new optimization problems, especially after selfimprovement. What does this get us? Here are some useful optimization problems that fit in this framework:

Writing selfproving code to a specification. After writing a specification of the code in a system such as Coq, we simply ask the optimizer to optimize according to the specification. This would be very useful once we have a specification for friendly AI.

Trying to prove arbitrary mathematical statements. Proofs are verifiable in a relatively short amount of time.

Automated invention/design, if we have a model of physics to verify the invention against.

General induction/Occam's razor. Find a generative model for the data so far that optimizes P(model)P(datamodel), with some limits on the time taken for the model program to run. Then we can run the model to predict the future.

Bioinformatics, e.g. protein folding.
These are all problems whose solutions can be efficiently evaluated and that a computer could plausibly solve, so I think this framework should provide good solutions to them. If the optimizer this framework produces performs well on all these problems, I think it deserves to be called an oracle AGI.
Selfimprovement
It seems that, if my arguments about optimization ability are convincing, an optimizer that does very well on the sample problems should also do well on the specific optimization problem of improving itself. This is especially true if the training problems include some programoptimization problems (i.e. finding a program that does well on some task). Once the optimizer improves itself, it will be better at improving itself, and so on. If the AI is softwarelimited, the framework should allow it to recursively improve itself up to a very high limit. Then we can use its hardware design power to design better chips for it and allow it to optimize even further.
The optimizer's ability to selfimprove is limited by the fact that the other parts of the program (1, 3, and 4) are fixed. The system is greedy in the sense that it just looks for even the slightest improvement, then changes the code and looks for more improvements, without any foresight. I don't know how much of a limitation this is. I don't see any reason that greedy improvements to the optimizer are insufficient for superoptimization from recursive selfimprovement.
Currently, the optimizer must completely derive a new optimizer from the specification (program 3), without looking at the current best optimizer (its own source code). In practice we might want to extend the framework to include incremental improvements (improvements on top of existing good answers). I don't know if a good theory for scoring incremental optimizers exists.
Safety
Probably the most important property of the system. What use is optimization if the world has been converted to computron in order to power a superpowerful protein folder? Here are a few different possible threats:
Underhanded question answers
The optimizer could give answers that are very good according to the problem description but are specifically crafted to cause things unintended by the human operators. For example, it could create a software design that, while correct according to specifications, has a "bug" that causes it to hack into computers on the internet and hook them up to the optimizer system. I don't think this is at all likely because the system is extremely shortsighted. Any selfmodification must improve its immediate ability to do better on optimization problems. Furthermore, all selfmodifications are found using methods that are judged by only immediate improvements. So any kind of longterm strategy (sacrificing some optimization power so it will have resources in the future) is out of the question. In other words, optimizer_score should not be seen as a utility function because the system only tries to improve it using greedy methods, not longterm planning.
Bugs in the system
What if the system gives the optimizer an incorrect score under some circumstances (e.g. if it performs a certain pattern of memory accesses)? Say that, by chance, the optimizer's improvement to itself causes it to get an incorrect score. It might internalize the rule "perform memory accesses in this pattern" to get a higher score. This itself is not especially dangerous; the optimizer will rewrite itself to just do a bunch of weird memory accesses that give it a high score.
What might be more dangerous is if the optimizer discovers an underlying pattern behind the system's hackability. Since the optimizer is penalized for complexity, a program like "do things that, when executed on a certain virtual machine, cause this variable in the machine to be a high number" might have a higher score than "do this certain complex pattern of memory accesses". Then the optimizer might discover the best way to increase the score variable. In the absolute worst case, perhaps the only way to increase the score variable is by manipulating the VM to go on the internet and do unethical things. This possibility seems unlikely (if you can connect to the internet, you can probably just overwrite the score variable) but should be considered.
I think the solution is straightforward: have the system be isolated while the optimizer is running. Completely disconnect it from the internet (possibly through physical means) until the optimizer produces its answer. Now, I think I've already established that the answer will not be specifically crafted to improve future optimization power (e.g. by manipulating human operators), since the system is extremely shortsighted. So this approach should be safe. At worst you'll just get a bad answer to your question, not an underhanded one.
Malicious misuse
I think this is the biggest danger of the system, one that all AGI systems have. At high levels of optimization ability, the system will be able to solve problems that would help people do unethical things. For example it could optimize for cheap, destructive nuclear/biological/nanotech weapons. This is a danger of technological progress in general, but the dangers are magnified by the potential speed at which the system could selfimprove.
I don't know the best way to prevent this. It seems like the project has to be undertaken in private; if the seed optimizer source were released, criminals would run it on their computers/botnets and possibly have it selfimprove even faster than the ethical version of the system. If the ethical project has more human and computer resources than the unethical project, this danger will be minimized.
It will be very tempting to crowdsource the project by putting it online. People could submit improvements to the optimizer and even get paid for finding them. This is probably the fastest way to increase optimization progress before the system can selfimprove. Unfortunately I don't see how to do this safely; there would need to be some way to foresee the system becoming extremely powerful before criminals have the chance to do this. Perhaps there can be a public base of the project that a dedicated ethical team works off of, while contributing only some improvements they make back to the public project.
Towards actual friendly AI
Perhaps this system can be used to create actual friendly AI. Once we have a specification for friendly AI, it should be straightforward to feed it into the optimizer and get a satisfactory program back. What if we don't have a specification? Maybe we can have the system perform induction on friendly AI designs and their ratings (by humans), and then write friendly AI designs that it predicts will have a high rating. This approach to friendly AI will reflect present humans' biases and might cause the system to resort to manipulative tactics to make its design more convincing to humans. Unfortunately I don't see a way to fix this problem without something like CEV.
Conclusion
If this design works, it is a practical way to create a safe, selfimproving oracle AI. There are numerous potential issues that might make the system weak or dangerous. On the other hand it will have shortterm benefits because it will be able to solve practical problems even before it can selfimprove, and it might be easier to get corporations and governments on board. This system might be very useful for solving hard problems before figuring out friendliness theory, and its optimization power might be useful for creating friendly AI. I have not encountered any other selfimproving oracle AI designs for which we can be confident that its answers are not underhanded attempts to get us to let it out.
Since I've probably overlooked some significant problems/solutions to problems in this analysis I'd like to hear some more discussion of this design and alternatives to it.