Stuart Armstrong recently posted a few ideas about restraining a superintelligent AI so that we can get useful work out of it. They are based on another idea of his, reduced impact. This is a quite elaborate and complicated way of limiting the amount of optimization power an AI can exert on the world. Basically, it tries to keep the AI from doing things that would make the world look too different than it already is.

First, why go to such great lengths to limit the optimization power of a superintelligent AI? Why not just not make it superintelligent to begin with? We only really want human level AI, or slightly above human level. Not a god-level being we can't even comprehend.

We can control the computer it is running on after all. We can just give it slower processors, less memory, and perhaps even purposely throttle it's code. E.g. restricting the size of it's neural network. Or other parameters that affect it's intelligence.

The counterargument to this is that it might be quite tricky to limit AI intelligence. We don't know how much computing power is enough. We don't know where "above human level" ends and "dangerous superintelligence" begins.

The simplest way would be to just run copies of the AI repeatedly, increasing it's computing power each time, until it solves the problem.

I have come up with a more elegant solution. Put a penalty on the amount of computing power the AI uses. This is put in it's utility function. The more computing power - and therefore intelligence and optimization - the AI uses, the more it is penalized. So it has an incentive to be as stupid as possible. Only using the intelligence necessary to solve the problem.

But we do want the AI to use as much computational resources as it needs to solve the problem. Just no more. So the penalty should be conditional on actually solving the problem it is given.

If the solution is probabilistic, then the penalty is only applied after reaching a plan that has a certain probability of success. This might need to be measured by another AI which is not resource constrained, but only does prediction.

To give a concrete example, lets say I give the AI a task. Say, I ask it to come up with a plan to collect 10,080 paperclips.

Another AI is given unlimited resources to do purely prediction. Given a plan, it predicts the probability that it will succeed or fail. This AI is safe in the sense that it is not an agent. It has no goals, and just makes predictions about things.

The main AI does the optimization. It tries to generate a plan that has the highest probability of succeeding.

Normally, this might involve extremely overoptimized plans for building nanotechnology and taking over the world. It uses all of it's available computing power. It tries to become as intelligent as possible by rewriting it's code. Perhaps it becomes thousands of times more intelligent than humans. Or millions of times more. It finds an answer that has 99.99% probability of succeeding.

However, now we give it a different utility function. We instead have it minimize the time it takes to get to a plan that has a 90% chance of succeeding.

Under a time constraint, the AI races to get to a plan. It tries to be as efficient as possible. It doesn't invest in any meta level improvements unless they really help it. It doesn't try to engineer complicated nanotechnology. That would take precious time.

Effectively, we have summoned a genie that is only just as powerful as it needs to be to fulfill our wish. And not any more powerful. It actually tries to be as stupid as possible.

There are other possible constraints we could use, or use in addition to this. Minimizing time limits intelligence because it gets fewer CPU cycles. We could also have it minimize memory or hard drive space, or any other computing resource.

We could also put a penalty on the complexity of the plan it produces. Perhaps measuring that by it's length. The simplest solution might prevent certain kinds of over-optimization. E.g. inserting plans for nanotechnology into it.

It's worth noting that you can't even create a paperclip maximizer in this system. You can't say "collect as many paperclips as possible". It has to be bounded. There needs to be a pass or fail test. E.g. "come up with a plan to collect 10,080 paperclips."

It's been noted in the past that bounding the goal isn't enough. The AI might then start maximizing the probability that it will achieve it's goal. E.g. building elaborate sensors to make sure it hasn't miscounted. Making as many redundant paperclips as possible, just in case something happens to them. You are still summoning an incredibly powerful genie, which might overoptimize.

This gets around that by only having it care about having a >90% chance of getting 10,080 paperclips. After that it stops optimizing.

Now this is not a solution to FAI, or even necessarily a usable reduced impact strategy. It will still try to find any loopholes it can in your task. What it does is limits the power of the genie trying to exploit them. It always takes the stupidest, simplest strategy that has a decent chance of working.

One failure case that bothers me is the AI just shoving a slight modification of it's own source code into the output. One which doesn't have a resource constraint. You may only be able to use it in cases where that isn't possible.

There are many cases where that isn't possible. The AI needs to also create a plan to obtain a computer to run it. And it needs to solve it's own version of the value loading problem, to get the copy to care about paperclips. So that adds complexity to that plan in some cases.

If we apply the complexity penalty I described above, then the source code for the AI must also be very short, or it can't do this.

And this requires obtaining it's own source code. An ability we can potentially restrict (you, as a human can't access your own source code, can you?) Without that it would need to reinvent AI, which takes resources.

But it is an interesting idea I had and I thought I'd share it.

Reposted from my blog.

New to LessWrong?

New Comment
48 comments, sorted by Click to highlight new comments since: Today at 7:52 AM

I think the idea of human level AGI is a red herring. There no such thing. An AGI with medium power is going to be better at some tasks than humans and worse at some others.

One failure case that bothers me is the AI just shoving a slight modification of it's own source code into the output.

This is the subagent problem again :-(

I feel like designing another AI should be much harder and more complicated in most cases. The only issue is the AI having access to it's own source code. This could potentially be solved. Humans don't have access to our own source code after all.

[-][anonymous]9y20

From a practical perspective, do you expect a company who has the potential for really really really powerful AI to stop because their AI is only really powerful?

Who cares? If a powerful company builds an AI, we have no control over what they do with it. It doesn't matter what FAI ideas we come up with. I only care about branches where "we" get to AI first, or some group that cares about AI risk.

[-][anonymous]9y20

What? There's all sorts of ways to influence what a company does. Would you agree that a large company or military building an AGI first is the most likely route to AGI?

I'd like to see a wiki page where all the AI control ideas are linked - including obviously this one - because I apparently have lost the overview.

Another AI is given unlimited resources to do purely prediction. Given a plan, it predicts the probability that it will succeed or fail. This AI is safe in the sense that it is not an agent. It has no goals, and just makes predictions about things.

Its goal is to predict as accurately as it can. Clearly taking over the world and reassigning all computing power to calculate the prediction is the best move.

We instead have it minimize the time it takes to get to a plan that has a 90% chance of succeeding.

How does it know how long a plan will take to design until it actually designs it? (I'm assuming your "time" is time to design a plan). How do we know the fastest designed plan is the safest? Maybe this AI generates unsafe plans faster than safe ones.

Tl;Dr not pessimistic enough.

Wouldnt taking over the world be a rather agentive thing for an AI that is not an agent to do?

"it is not an agent" is not a description of how to build an AI that is in fact, not an agent. It's barely better than "not an unsafe AI".

Besides, isn't "giving an answer to the prediction" a rather agenty thing for such an AI to do?

"it is not an agent" is not a description of how to build an AI that is in fact, not an agent. It's barely better than "not an unsafe AI".

Non-agents aren't all that mysterious. We can already build non agents. Google is a non-agent.

Besides, isn't "giving an answer to the prediction" a rather agenty thing for such an AI to do?

No, it;s a response. Non agency means not doing anything unless prompted.

Non agents aren't all that mysterious. We can already build non agents. Google is a non agent.

Compare: safe (in the FAI sense) computer programs aren't that mysterious. We can already build safe computer programs. Android is a safe computer program.

Non agency means not doing anything unless prompted.

Well, who cares if it doesn't do anything unless prompted, if it takes over the universe when prompted to answer a question? And if you can rigorously tell it not to do that, you've already solved FAI.

Non agents aren't all that mysterious. We can already build non agents. Google is a non agent.

Compare: safe (in the FAI sense) computer programs aren't that mysterious. We can already build safe computer programs. Android is a safe computer program.

Do you have a valid argument that nonagentive programmes would be dangerous? Because saying "it would agentively do X" isn't a valid argument. Pointing out the hidden pitfalls of such programmes is something MIRI could usefully do. An unargued belief that everything is dangerous is not useful.

?Well, who cares if it doesn't do anything unless prompted, if it takes over the universe when prompted to answer a question

Oh, you went there.

Well: how likely is an AI designed to be nonagentive as a safety feature to have that particular failure mode?

And if you can rigorously tell it not to do that, you've already solved FAI.

You may have achieved safety., but it has nothing to do with "achieving FAI" in the MIRI sense of hardcoding the totality of human value. The whole point is that it is much easier, because you are just not building in agency.

A program designed to answer a question necessarily wants to answer that question. A superintelligent program trying to answer that particular question runs the risk of acting as a paperclip maximizer.

Suppose you build a superintelligent program that is designed to make precise predictions, by being more creative and better at predictions than any human would. Why are you confident that one of the creative things this program does to make itself better at predictions isn't turning the matter of the Earth into computronium as step 1?

A program designed to answer a question necessarily wants to answer that question.

I don't think my calculator wants anything.

Does an amoeba want anything? Does a fly? A dog? A human?

You're right, of course, that we have better models for a calculator than as an agent. But that's only because we understand calculators and they have a very limited range of behaviour. As a program gets more complex and creative it becomes more predictive to think of it as wanting things (or rather, the alternative models become less predictive).

Notice the difference (emphasis mine):

A program designed to answer a question necessarily wants to answer that question

vs

...it becomes more predictive to think of it as wanting things

Well, the fundamental problem is that LW-style qualiafree-rationalism has no way to define what the word "want" means.

Is there a difference between "x is y" and "assuming that x is y generates more accurate predictions than the alternatives"? What else would "is" mean?

Is there a difference between "x is y" and "assuming that x is y generates more accurate predictions than the alternatives"? What else would "is" mean?

Are you saying the model with the currently-best predictive ability is reality??

Not quite - rather the everyday usage of "real" refers to the model with the currently-best predictive ability. http://lesswrong.com/lw/on/reductionism/ - we would all say "the aeroplane wings are real".

rather the everyday usage of "real" refers to the model with the currently-best predictive ability

Errr... no? I don't think this is true. I'm guessing that you want to point out that we don't have direct access to the territory and that maps is all we have, but that's not very relevant to the original issue of replacing "I find it convenient to think of that code as wanting something" with "this code wants" and insisting that the code's desires are real.

Anthropomorphization is not the way to reality.

A program designed to answer a question necessarily wants to answer that question. A superintelligent program trying to answer that particular question runs the risk of acting as a paperclip maximizer.

What does that mean? It's necessarily satisfying a utility function? It isn't as Lumifer's calculator shows.

Suppose you build a superintelligent program that is designed to make precise predictions, by being more creative and better at predictions than any human would. Why are you confident that one of the creative things this program does to make itself better at predictions isn't turning the matter of the Earth into computronium as step 1?

I can be confident that nonagents wont't do agentive things.

Why are you so confident your program is a nonagent? Do you have some formula for nonagent-ness? Do you have a program that you can feed some source code to and it will output whether that source code forms an agent or not?

It's all standard software engineering.

I'm a professional software engineer, feel free to get technical.

Have you ever heard of someone designing a nonagentive programme that unexpectedly turned out to be agentive? Because to me that sounds like into the workshop to build a skateboard abd coming with a F1 car.

I've known plenty of cases where people's programs were more agentive than they expected. And we don't have a good track record on predicting which parts of what people do are hard for computers - we thought chess would be harder than computer vision, but the opposite turned out to be true.

I've known plenty of cases where people's programs were more agentive than they expected.

I haven't: have you any specific examples?

I've known plenty of cases where people's programs were more agentive than they expected.

"Doing something other than what the programmer expects" != "agentive". An optimizer picking a solution that you did not consider is not being agentive.

Do you have a valid argument that nonagentive programmes would be dangerous? Because saying "it would agentively do X" isn't a valid argument. Pointing out the hidden pitfalls of such programmes is something MIRI could usefully do. An unargued belief that everything is dangerous is not useful.

I'm claiming that "nonagent" is not descriptive enough to actually build one. You replied that we already have non agents, and I replied that we already have safe computer programs. Just like we can't extrapolate from our safe programs that any AI will be safe, we can't extrapolate from our safe nonagents that any non-agent will be safe.

Well: how likely is an AI designed to be nonagentive as a safety feature to have that particular failure mode?

I still have little idea what you mean by nonagent. It's a black box, that may have some recognizable features from the outside, but doesn't tell you how to build it.

I replied that we can already build nonagents.

It remains the case that if you think they could be dangerous, you need to explain how.

I still have little idea what you mean by nonagent. It's a black box, that may have some recognizable features from the outside, but doesn't tell you how to build it.

Again, we already know how to build them, in that we have them.

Worse than that. MIRI can't actually build anything they propose. It's just that some MIRI people have a reflex habit of complaining that anything outside of MIRI land is too vague.

Its goal is to predict as accurately as it can. Clearly taking over the world and reassigning all computing power to calculate the prediction is the best move.

Think of Solomonoff Induction or some approximation of it. It is not an agent. It just tries every possible hypothesis on that data and does a Bayesian update.

How does it know how long a plan will take to design until it actually designs it? (I'm assuming your "time" is time to design a plan).

It doesn't know. It needs to predict. But in general humans have a general idea of how solvable a problem is before they solve it. An engineer knows that building a certain kind of machine is possible, long before he works out the exact specification. A computer programmer knows a problem is probably solvable with a certain approach before they work out the exact computer code to produce it.

This AI is highly incentivized to work fast. Searching down the totally wrong tree is punished highly. Trying simpler ideas before more complex ones is rewarded. So is being able to quickly come up with possible solutions that might work, before reviewing them in more depth.

I don't know exactly what strategies it will use, but it's utility function is literally to minimize computing power. If you trust the AI to fulfill it's utility function, then you can trust it will do this to the best of it's ability.

How do we know the fastest designed plan is the safest? Maybe this AI generates unsafe plans faster than safe ones.

Safety is not guaranteed with this approach. I am fully upfront about this. What it does is minimize optimization. The plan you get will be the stupidest one the AI can come up with. This significantly decreases risk.

Tl;Dr not pessimistic enough.

IMHO extreme pessimism leads to throwing out a huge number of ideas. Some of which might be practical, or lead to more practical approaches. I was extremely pessimistic about FAI for a long time until reading some of the recent proposals to actually attack the problem. None of the current ideas are sufficient, but they show it's at least approachable.

Think of Solomonoff Induction or some approximation of it.

Which is uncomputable, and an approximation would presumably benefit from increased computing power.

It is not an agent. It just tries every possible hypothesis on that data and does a Bayesian update.

Like that's simple? How exactly do you make an AI that isn't an agent? With no goals, why does it do anything?

I don't know exactly what strategies it will use, but it's utility function is literally to minimize computing power. If you trust the AI to fulfill it's utility function, then you can trust it will do this to the best of it's ability.

But why are plans that take less computing power to come up with more likely to be safe? Besides, if it calculates that searching for simple solutions is likely not going to meet the 90% criteria, it can forgoe that and jump straight to complicated ones.

Your idea is similar to http://lesswrong.com/lw/854/satisficers_want_to_become_maximisers/, have you seen that?

Which is uncomputable, and an approximation would presumably benefit from increased computing power.

I gave the simplest possible counter example to your objection. I never proposed that we actually use pure Solomonoff induction.

EDIT: I realize you said something different. You implied that an approximation of Solomonoff induction would benefit from more computing power, and so would act as an agent to obtain it. This is totally incorrect. Solomonoff induction can be approximated in various ways by bounding the run time of the programs, or using simpler models instead of computer programs, etc. None of these create any agentness. They still just do prediction. I'm not sure you understand the distinction between agents and predictive non-agents, and this is very important for FAI work.

Like that's simple? How exactly do you make an AI that isn't an agent? With no goals, why does it do anything?

The entire field of machine learning is about building practical approximations of Solomonoff inductions. Algorithms which can predict things and which are not agents. Agents are just special cases of prediction algorithms, where they take the action that has the highest predicted reward.

But why are plans that take less computing power to come up with more likely to be safe?

Because they are plans that less powerful intelligences could have come up with. We don't worry about humans taking over the world, because they aren't intelligent enough. The danger of superintelligence is because it could be far more powerful than us. This is a limit on that power.

Besides, if it calculates that searching for simple solutions is likely not going to meet the 90% criteria, it can forgoe that and jump straight to complicated ones.

That's a feature. We don't know how much computing power is necessary. We just want it to minimize it.

I think several of your objections were addressed in http://lesswrong.com/lw/tj/dreams_of_friendliness/. That's pretty much where I'm coming from. Do you have good responses to the arguments there?

EY is talking about oracles which answer questions. I am just talking about prediction.

But yes you do have a point that building a powerful predictive AI is not completely trivial. But it's certainly possible. If you have infinite computing power, you can just run Solomonoff induction.

Realistically we will have to find good approximations, and this might require using agenty-AI. And if so we will have to do work on controlling that AI. I believe this is possible, because it's a simple domain with a well specified goal, and no output channels except a single number.

Anyway, the other AI judge isn't an important or necessary part of my idea. I just wanted to have a simple outside judge of solutions. You could make the judge internal, have the AI use it's own probability estimates to decide when to output a solution. It is essentially doing that already by trying to predict what the judge will say to it's plan. The judge is redundant.

I can't remember where I first came across the idea (maybe Daniel Dennett) but the main argument against AI is that it's simply not worth the cost for the foreseeable future. Sure, we could possibly create an intelligent, self-aware machine now, if we put nearly all the relevant world's resources and scientists onto it. But who would pay for such a thing?

What's the ROI for a super-intelligent, self-aware machine? Not very much, I should think - especially considering the potential dangers.

So yeah, we'll certainly produce machines like the robots in Interstellar - clever expert systems with a simulacrum of self-awareness. Because there's money in it.

But the real thing? Not likely. The only way it will be likely is much further down the line when it becomes cheap enough to do so for fun. And I think by that time, experience with less powerful genies will have given us enough feedback to be able to do so safely.

What's the ROI for a super-intelligent, self-aware machine?

That clearly depends on how super is "super-intelligent". For a trivial example, imagine an AI which can successfully trade in the global financial markets.

What happens if it doesn't want to - if it decides to do digital art or start life in another galaxy?

That's the thing, a self-aware intelligent thing isn't bound to do the tasks you ask of it, hence a poor ROI. Humans are already such entities, but far cheaper to make, so a few who go off and become monks isn't a big problem.

What happens if it doesn't want to - if it decides to do digital art or start life in another galaxy?

You give it proper incentives :-)

Or, even simpler X-)

Humans are already such entities

Nope, remember, we're talking about a super-intelligent entity.

I don't think AI will be incredibly expensive. There is a tendency to believe that hard problems require expensive and laborious solutions.

Building a flying machine was a hard problem. An impossible problem. But two guys from a bicycle shop built the first airplane on their own. A lot of hard math problems are solved by lone geniuses. Or by the iterative work of a lot of lone geniuses building on each other. But rarely by large organized projects.

And there is a ton of gain in building smarter and smarter AIs. You can use them to automate more and more jobs, or do things even humans can't do.

The robots in interstellar were AGI. They could fully understand English and work in unrestricted environments. They are already at, or very close to, human level AI. But there's no reason advancement has to stop at human level AI. People will continue to tweak it, run it on bigger and faster computer, and eventually have it work on it's own code.

Yes, there's a set of AI safety solutions involving less powerful AIs, special purpose AIs and resource-constrained AIs that are the low hanging fruit in the field, as compared to building as superintelligent general purpose AI (if you can) and then coming up with some tremendously complex and fragile way of constraining it (if you can).

But AIs that are at human level, or only slightly above, could be extremely useful. Perhaps in even solving the FAI problem, by asking them to generate ideas. When I've proposed this in the past, people were concerned that it might be difficult to limit the intelligence of an AI. This is a possible solution to that.

It's not tremendously complex. The core idea is quite simple. It's just having the AI minimize time to come up with a solution.

If you want AIs that are smarter than humans, you would be much better off with AIs that are smarter than humans at one specific thing, and to exclude dangerous things like the knowledge of psychology necessary to talk its way out of a box.

All the constraints you put down aren't the same as making the least powerful genie. Restricting its time or resources or whatever increases its efficiency, but that's just as a by product, an accident. The least powerful genie should be the most efficient, not as a by-product of its design, but as the end goal. The model you put down just happens to approximate the least powerful genie.

[-][anonymous]9y-20

Your theory can be tested. The difference between the highest intelligence and lowest intelligence people is profound. Not every highest intelligence person has power over others but some do, and no - not one - lowest intelligence person has power over others. The situation you describe exists now. What throttles should we put on the highest intelligence humans? How much power (social, financial, democratic, military, etc) should the lowest intelligence people be enfranchised with?

I propose this as a thought experiment only.