1. Keep the AI in a box and don't interact with it. 

Don't have any conversations with it whatsoever. 

I'll explain how this would work in a second, but first some motivation. 

2. Consider all the great scientific discoveries over the course of human history -- Newtonian mechanics, relativity, algorithms for primality testing, etc etc. 

Could a "God" who is ``bad at math'' have created the universe in order to have us make these discoveries for him? 

It sounds insane: surely any being with the power to create our universe would know how to check if a number is prime.

3. But perhaps it isn't insane. Consider that at this point in its history, we

a) do not know how to factor numbers efficiently.

b) can create rudimentary simulated worlds (e.g., world of warcraft or your favorite MMO).

4. Here is how the scheme could work in more detail. 

Imagine your typical world of warcraft server, but each orc and human is controlled by a trained neural network of complexity roughly the same as the average human.

The simulated world does not have enough natural food for orc or human populations to thrive. The life of man and orc is, in the words of Hobbes, "poor, nasty, brutish, and short." 

But it does contain an ocean which washes up pebbles in the shore. Pebbles come in two types, red and black. The black pebbles have labels, which are randomly assigned from the set "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "x", "=". The red pebbles have strings of the same symbols written on them.

When you arrange the following sequence of pebbles  

2x3=6

(all in black up to and including the "=" and the last "6" is red) 

they crack open, and inside is enough food to feed a village for a year.

On the other hand, 2x3=5 has no effect.

The longer the prime factorization, the more food is in the pebbles.

No other way to open the pebbles exists.

Once you have arranged 2x3=6, the next time you arrange it with new pebbles gives you 99% as much food; the next time after that, 99% of 99%, and so on. 

5.  Societies in this simulated world will devote a lot of effort to figuring out the prime factorizations of the red numbers which wash up on the shore.

We program things so that, over time, length of the numbers on the red pebbles which get washed up on the shore gets longer and longer.  

Since computers are so much faster than humans, and the orcs/humans in this world are about as smart as we are, we might let this society run for a few million simulated human-years before examining the data dump from it. If a polynomial time algorithm for factoring primes exist, they will likely have found it. 

6. Given more difficult problems we want solved, we can imagine other ways of embedding them into the physics of the simulated world. Consider, for example, how the Schrodinger equation is embedded into the fabric of our universe.  

7. My question is: what can go wrong?

I can think of a couple of potential issues and am wondering whether I missed anything.

8. One problem is, of course, that someone might misuse the technology, running this experiment while breaking the protocol and telling the simulated people facts about our world. Of course, this is a generic problem with any protocol for any problem (bad things happen if you don't follow it).  

A related problem is that we might have activists who advocate for the rights of simulated man/orc; such activists might interrupt one of the experiments halfway through and break the protocol. 

We might lessen the risk of this by, say, making each experiment take less than one second of real time. Given that the current speed of the top supercomputer in the world is on the order of 10^16 flops per second, and projecting that into the future, one second might equal millions of years of simulated time. 

This is actually a good idea also for the following reason: the simulated people might realize that they are in a simulated world.

They might spend a lot of effort begging us to give them immortality or more pebbles. We might give in if we heard them. Best to make it impossible for us to hear their pleas. 

They will eventually cease trying to contact us once they realize no one is listening.

Ethically, I think one could justify all this. It is hard to argue, for example, that we (real human beings) have been harmed by being brought into existence in a universe without a God who is listening; almost all of us would prefer to be alive rather not. The same would go for them: surely, their simulated existence, imperfect as it is,  is not worse than not having been brought into the world in the first place? 

They might become discouraged once they receive no answers from us, but I think this is not inevitable. Suppose, for example, you come to believe the real world has been created for the purpose of solving math problems; most of us would be shaken by this but likely would go on to live our lives as before. 

Anyway, if we wanted to avoid all this we might consider more subtle ways of embedding problems into the rules of their world. 

9. Perhaps the most serious problem might be if there is a bug in our code for the simulation. 

In the worst case, they might be able to come up with some kind of exploit that lets them see anything stored on the computer running the simulation, in particular the source code of their universe. One might imagine they would even be able to modify said source code. 

One possible way to solve this would to write the simulation with the aid of some kind of program-checking software that ensures things like this will not happen.  Already, people are playing around with tools that enable you to write code which comes with a guarantee that the program will not enter into certain states. 

Still, even if a bug were to occur, it seems it would be quite difficult for them to take over the world -- though perhaps not impossible. I don't think they could obtain knowledge of the hardware they are running on -- only the software (people who know computer architecture, correct me if I'm wrong?). And they don't know the physics of our world.  If the computer which runs the simulation is not connected to any network -- say it is stored underground in a soundproof vault -- I'm at a loss to imagine how they would take over the world. 

Edit: note that the last paragraph was a parenthetical remark -- the real solution is to use program checking to ensure no violation of the physical laws of their world happens.

10. Why would we do any of this relative to, say, mind uploads which run at many times the speed of ordinary humans? Because an uploaded mind knows quite a bit about human psychology and the physics of the world, and could potentially use this knowledge in ways harmful to us.  

11. One can view this scheme as a kind of genetic algorithm (in a sense, it is a genetic algorithm). 

12. Finally: many people have written about friendly AI, and I've not been able to read everything (or even most) of what is written. Apologies if something like this has been discussed at length before -- in which case, I'd appreciate a pointer to the relevant discussions. 

The most relevant thing I've seen here is That Alien Message, which is really close to item 9, but still imagines interaction with the AI in the box, giving the AI an opportunity to convince us to let it use the internet. 

Update: Let me say something about why the simulated beings will not be able to figure out the physical laws of our world (though they might have a range of plausible guesses about it). 

Imagine you live in a world governed by Newtonian mechanics: every experiment you do is perfectly, 100% explained by Newton's laws. You come to believe you live in a simulation intended to force you to solve second order differential equations. 

What can you deduce about the physics of the real world?

Answer: nothing. Newton's laws is the only information you have. And that is just not enough information. 

Perhaps the real world is governed by Newton's laws just like yours. Or maybe it has general relativity (you are smart enough to figure out that is a possibility). Or maybe it is governed by F=m x'''. Electromagnetism might exist or it might not. Ditto for quantum mechanics. The real world could be discrete of continuous. How big could g, G, and Planck's constant be in the real world? Is there another force like gravity which roughly acts on the same scale? In fact, the possibilities are endless. 

You just don't know. 

Now suppose you are actually a superintelligence. You are really, really smart. What can you infer about the laws of physics which govern real world?

The answer, I believe, the same as above. 

A superintelligence may be smart, but its abilities are not without limit. It cannot solve a problem for which there is not enough information to solve it. 

Consider that world of warcraft feels the same an orc regardless of whether it runs on a pc or a mac of a quantum computer. Just experiencing a world with Newton's laws will not reveal any information about the hardware your universe is running on.

I do acknowledge, however, that the super-intelligence might come up with a large list of possible guesses about what our physics could be. 

Update 2: If it sounds to you like the AIs might take over our world with this scheme, consider the following scenario.

Suppose that we terminate the simulation after a million years. We have a particular second order differential equation in mind before starting the simulation. After termination, every record of the simulated world is destroyed, except for any time someone has written a solution to this specific differential equation on a piece of paper. This is done automatically by the code. An automatic differentiation algorithm checks whether the solution is correct. If it is, it shows us the solution; else, it deletes it along with the rest of the record of the world. 

The output is either nothing, or a solution to the differential equation we wanted solved. 

(of course, this is a toy scenario; more realistically, we will want a solution to a more difficult scientific problem)

Assuming no bugs in the code, can you give an account of how a superintelligence could take over the real world in this scenario?

New Comment
29 comments, sorted by Click to highlight new comments since:

There is such a thing as proving code but it's slow and very very expensive, on the order of hundreds of dollars per line of code, increasing with the complexity of the program.

Even with proven code, there have been cases where compiler optimizations have introduced security flaws.

So the odds are that intelligent entities living in a non-trivial simulation would eventually be able to find an exploitable flaw in your code and take over their own system or maybe just crash it.

You might also like this story: http://ttapress.com/553/crystal-nights-by-greg-egan/

You also might like Accelerando where it's implied that mega-lightyear-spanning civilizations (Kardashev level II or III) are attempting a side-channel timing attack on the virtual machine the universe is being run on.

With the specific example you give with merely near-human intelligence where you've got a concrete problem or set of problems you want the answer to the question. Not to chat with an AI afterwards.

So you could also have a number of Puzzle-walls with problems of increasing difficulty 1 to 100 where the "answer" is a general solution to a problem. The first 55 generate massive wealth when solved but the one you really want an answer to is 56, which when solved pauses/halts the simulation without warning and outputs the solution.

57 onward are only there so that nobody realizes that the universe will end when number 56 is solved.

Instead of 2 way communication you get a single answer.

2-way communication is far more risky.

Nice idea...I wrote an update to the post suggesting what seemed to me to be a variation on your suggestion.

About program checking: I agree completely. I'm not very informed about the state of the art, but it is very plausible that what we know right now is not yet up to task.

I'm not sure it's just a matter of what we know right now, it's mathematically provable that you can't create a program which can find all security flaws or prove any provable code so bugs are pretty much inevitable no matter how advanced we become.

The theorem you cite (provided I understood you correctly) does not preclude the possibility of checking whether a program written in a certain pre-specified format will have bugs. Bugs here are defined to be certain undesirable properties (e.g., looping forever, entering certain enumerated states, etc).

Baby versions of such tools (which automatically check whether your program will have certain properties from inspecting the code) already exist.

If the language and format you're using is Turing complete then you can't write a program which can guarantee to find all bugs.

If you limit yourself to a subset of features such that you are no longer writing in a format which is turing complete then you may be able to have a program capable of automatically proving that code reliably.

Static analysis code does exist but still doesn't guarantee 100% accuracy and is generally limited to the first couple of levels of abstraction.

Keep in mind that if you want to be 100% certain of no bugs you also have to prove the compiler, the checker, any code your program interacts with and the hardware on which the code runs.

If you limit yourself to a subset of features such that you are no longer writing in a format which is turing complete then you may be able to have a program capable of automatically proving that code reliably.

Right, that is what i meant.

  1. Keep the AI in a box and don't interact with it.

The rest of your posting is about how to interact with it.

Don't have any conversations with it whatsoever.

Interaction is far broader than just conversation. If you can affect it and it can affect you, that's interaction. If you're going to have no interaction, you might as well not have created it; any method of getting answers from it about your questions is interacting with it. The moment it suspects what it going on, it can start trying to play you, to get out of the box.

I'm at a loss to imagine how they would take over the world.

This is a really bad argument for safety. It's what the scientist says of his creation in sci-fi B-movies, shortly before the monster/plague/AI/alien/nanogoo escapes.

These are good points. Perhaps I should not have said "interact" but chosen a different word instead. Still, its ability to play us is limited since (i) we will be examining the records of the world after it is dead (ii) it has no opportunity to learn anything about us.

Edit: we might even make it impossible for it to game us in the following way. All records of the simulated world are automatically deleted upon completion -- except for a specific prime factorization we want to know.

This is a really bad argument for safety.

You are right, of course. But you wrote that in response to what was a parenthetical remark on my part -- the real solution is to use program checking to make sure the laws of physics of the simulated world are never violated.

To be fair, all interactions described happen after the AI has been terminated, which does put up an additional barrier for the AI to get out of the box. It would have to convince you to restart it without being able to react to your responses (apart from those it could predict in advance) and then it still has to convince you to let it out of the box.

Obviously, putting up additional barriers isn't the way to go and this particular barrier is not as impenetrable for the AI as it might seem to a human, but still, it couldn't hurt.

It is hard to argue, for example, that we (real human beings) have been harmed by being brought into existence in a universe without a God who is listening; almost all of us would prefer to be alive rather not. The same would go for them: surely, their simulated existence, imperfect as it is, is not worse than not having been brought into the world in the first place?

It is not that hard to argue this. This is the non-identity problem.

Ethically, I think one could justify all this. It is hard to argue, for example, that we (real human beings) have been harmed by being brought into existence in a universe without a God who is listening; almost all of us would prefer to be alive rather not. The same would go for them: surely, their simulated existence, imperfect as it is, is not worse than not having been brought into the world in the first place?

At least some of them will tell you they had rather not been born. But maybe you'll want to equip these orcs with an even stronger drive for existence, so they never choose death over life even if you torture them; would that make it more ok? I suspect not, so something with the "Do they complain to having been created?" approach seems flawed imo. Creating beings with a strong preference for existence would make it too easy to legitimize doing with them whatever you want.

How about imagining beings who at any moment are intrinsically indifferent to whether they exist or not? They only won't complain as long as they don't suffer. Perhaps that's too extreme as well, but if it's only simple/elegant rules you're looking for, this one seems more acceptable to me than the torture-bots above.

I guess I am willing to bite the bullet and say that, as long as entity X prefers existence to nonexistence, you have done it no harm by bringing it into being. I realize this generates a number of repulsive-sounding conclusions, e.g., it becomes ethical to create entities which will live, by our 21st century standards, horrific lives.

At least some of them will tell you they had rather not been born.

If one is willing to accept my reasoning above, I think one can take one more leap and say that statistically as long as the vast majority of these entities will prefer existing to never having been brought into being, we are in the clear.

If you use the entities' preferences to decide what's ethical, then everything is (or can be), because you can just adjust their preferences accordingly.

Even if the huge computing or algorithmic advances needed fell out of the sky tomorrow, this scheme still doesn't seem like it solves the problems we really want it to solve, because it does not allow the the agents to learn anything interesting about our world.

if I try to calculate 6 divided by 3 on a calculator and it answers "you need to exercise more" ahve I been served better because it answered a better question?

Pretend that instead of "exercise more," the calculator gave you advice that was actually valuable. Then yes. Just because you expect it to be a calculator doesn't mean it can't be more valuable if it can do more valuable things.

  1. When talking about dealing and (non)interacting with real AIs, one is always talking about a future world with significant technological advances relative to our world today.

  2. If we can formulate something as a question about math, physics, chemistry, biology, then we can potentially attack it with this scheme. These are definitely problems we really want to solve.

  3. Its true that if we allow AIs more knowledge and more access to our world, they could potentially help us more -- but of course the number of things that can go wrong has to increase as well. Perhaps a compromise which sacrifices some of the potential while decreasing the possibilities that can go wrong is better.

First off, I'm a bit skeptical about whether you can actually create a superintelligent AI by combining sped up humans like that, I don't think that is the core of your argument, though, so let's assume that you can and that the resultant society is effectively a superintelligence now.

The problem with superintelligences is that they are smarter than you. It will realize that it is in a box and that you are going to turn it off eventually. Given that this society is based on natural selection it will want to prevent that. How will it accomplish that? I don't know, I'm not a superintelligence and your scenario doesn't provide enough detail to figure out what you missed, but it is very, very difficult not to give a superintelligence any hints of how the physics of our world work. Maybe they notice minor fluctuations in the speed of the simulation based on environmental changes to the hardware or maybe they can reverse engineer things of our physiology from the way you wrote the simulation. That doesn't seem like much, but humans figured out an awful lot about (for instance) the events right after the big bang from seemingly absurdly tiny hints and we're just regular intelligences.

Even if they can't find a way out of their box in the time given to them they might try to convince you to run the simulation again with a longer running time. They could pretend to be unable to find the solution to prime factorization in time or they could convince you that the method they did find fails for very high primes (by making very subtle mistakes in its application). If it uses this method it also conveniently achieves a method of communicating with the outside world (through the investigators that inspect the dump after the simulation ends) and might be able to set up a better situation for itself the second time round.

is very, very difficult not to give a superintelligence any hints of how the physics of our world work.

I wrote a short update to the post which tries to answer this point.

Maybe they notice minor fluctuations in the speed of the simulation based on environmental changes to the hardware

I believe they should have no ability whatsoever to detect fluctuations in the speed of the simulation.

Consider how the world of world of warcraft appears to an orc inside the game. Can it tell the speed at which the hardware is running the game?

It can't. What it can do is compare the speed of different things: how fast does an apple fall from a tree vs how fast a bird flies across the sky.

The orc's inner perception of the flow of time is based on comparing these things (e.g., how fast does an apple fall) to how fast their simulated brains process information.

If everything is slowed down by a factor of 2 (so you, as a player, see everything twice is slow), nothing appears any different to a simulated being within the simulation.

You are absolutely correct, they wouldn't be able to detect fluctuations in processing speed (unless those fluctuations had an influence in, for instance, the rounding errors in floating point values).

About update 1: It knows our world very likely has something approximating newtonian mechanics, that is a lot of information by itself. but more than that, it knows that the real universe is capable of producing intelligent beings that chose this particular world to simulate. From a strictly theoretical point of view that is a crapton of information, I don't know if the AI would be able to figure out anything useful from it, but I wouldn't bet the future of humanity on it.

About update 2: That does work, provided that this is implemented correctly, but it only works for problems that can be automatically verified by non-AI algorithms.

but more than that, it knows that the real universe is capable of producing intelligent beings that chose this particular world to simulate.

Good point -- this undermines a lot of what I wrote in my update 1. For example, I have no idea if F = m d^3 x / dt would result in a world that is capable of producing intelligent beings.

I should at some point produce a version of the above post with this claim, and other questionable parenthetical remarks I made, deleted, or at least acknowledging that they require further argumentation; they are not necessary for the larger point, which is that as long as the only thing the superintelligence can do (by definition) is live in a simulated world governed by Newton's laws, and as long as we don't interact with it at all except to see an automatically verified answer to a preset question (e.g., factor "111000232342342"), there is nothing it can do to harm us.

I'm a bit skeptical about whether you can actually create a superintelligent AI by combining sped up humans like that,

Why not? You are pretty smart, and all you are is a combination of 10^11 or so very "dumb" neurons. Now imagine a "being" which is actually a very large number of human-level intelligences, all interacting...

Yeah, that didn't came out as clear as it was in my head. If you have access to a large number of suitable less intelligent entities there is no reason you couldn't combine them into a single, more intelligent entity. The problem I see is about the computational resources required to do so. Some back of the envelope math:

I vaguely remember reading that with current supercomputers we can simulate a cat brain at 1% speed, even if this isn't accurate (anymore) it's probably still a good enough place to start. You mention running the simulation for a million years simulated time, let's assume that we can let the simulation run for a year rather than seconds, that is still 8 orders of magnitude faster than the simulated cat.

But we're not interested in what a really fast cat can do, we need human level intelligence. According to a quick wiki search, a human brain contains about 100 times as many neurons as a cat brain. If we assume that this scales linearly (which it probably doesn't) that's another 2 orders of magnitude.

I don't know how many orcs you had in mind for this scenario, but let's assume a million (this is a lot less humans than it took in real life before mathematics took off, but presumably this world is more suited for mathematics to be invented), that is yet another 6 orders of magnitude of processing power that we need.

Putting it all together, we would need a computer that has at least 10^16 times more processing power than modern supercomputers. Granted, that doesn't take into account a number of simplifications that could be build into the system, but it also doesn't take into account the other parts of the simulated environment that require processing power. Now I don't doubt that computers are going to get faster in the future, but 10 quadrillion times faster? It seems to me that by the time we can do that, we should have figured out a better way to create AI.

Here is my attempt at a calculation. Disclaimer: this is based on googling. If you are actually knowledgeable in the subject, please step in and set me right.

There are 10^11 neurons in the human brain.

A neuron will fire about 200 times per second.

It should take a constant number of flops to decide whether a neuron will fire -- say 10 flops (no need to solve a differential equation, neural networks usually use some discrete heuristics for something like this)

I want a society of 10^6 orcs running for 10^6 years

As you suggest, lets let the simulation run for a year of real time (moving away at this point from my initial suggestion of 1 second). By my calculations, it seems that in order for this to happen we need a computer that does 2x10^25 flops per second.

According to this

http://www.datacenterknowledge.com/archives/2015/04/15/doe-taps-intel-cray-to-build-worlds-fastest-supercomputer/

...in 2018 we will have a supercomputer that does about 2x10^17 flops per second.

That means we need a computer that is one hundred million times faster than the best computer in 2018.

That is still quite a lot, of course. If Moore's law was ongoing, this would take ~40 years; but Moore's law is dying. Still, it is not outside the realm of possibility for, say, the next 100 years.

Edit: By the way, one does not need to literally implement what I suggested -- the scheme I suggested is in principle applicable whenever you have a superintelligence, regardless of how it was designed.

Indeed, if we somehow develop an above-human intelligence, rather than trying to make sure its goals are aligned with ours, we might instead let it loose within a simulated world, giving it a preference for continued survival. Just one superintelligence thinking about factoring for a few thousand simulated years would likely be enough to let us factor any number we want. We could even give it have in-simulation ways of modifying its own code.

I think this calculation too conservative. The reason is (as I understand it) that neurons are governed by various differential equations, and simulating them accurately is a pain in the ass. We should instead assume that deciding whether a neuron will fire will take a constant number of flops.

I'll write another comment which attempts to redo your calculation with different assumptions.

It seems to me that by the time we can do that, we should have figured out a better way to create AI.

But will we have figured a way to reap the gains of AI safely for humanity?

I vaguely remember reading that with current supercomputers we can simulate a cat brain at 1% speed, even if this isn't accurate (anymore) it's probably still a good enough place to start.

The key question is what you consider to be a "simulation". The predictions such a model makes are far from the way a real cat brain works.

The ability of the simulated society to probe for outside reality goes up with simulated time not actual time passed. The phrasing of the text made it seem like comparison to human outside interventors.

Still inability to realise what you are doing seems rather dangerous. It is like "everythig is going to be fine if you just don't look down". Being reflective about what you are doing is often considered a virtue and not a vice.

Still inability to realise what you are doing seems rather dangerous.

So far, all I've done is post a question on lesswrong :)

More seriously, I do regret it if I appeared unaware of the potential danger. I am of course aware of the possibility that experiments with AI might destroy humanity. Think of my post above a suggesting a possible approach to investigate -- perhaps one with some kinks as written (that is why I'm asking a question here) but (I think) with the possibility of one day having rigorous safety guarantees.

I don't mean the writing of this post but in general the principle of trying to gain utility from minimising self-awareness.

Usually you don't make processes as opaque as possible to increase their possibility of going right. The opposite of atleast social political processes being transparent is seen as pretty important.

If we are going to create minilife just to calculate 42, seeing it get calculated should not be a super extra temptation. Preventing the "interrupt/tamper" decision by limiting options is rather backwards in doing that while it would be better to argue why it should not be chosen even if available.