Comment author: RichardKennaway 12 January 2016 12:46:18PM 2 points [-]
  1. Keep the AI in a box and don't interact with it.

The rest of your posting is about how to interact with it.

Don't have any conversations with it whatsoever.

Interaction is far broader than just conversation. If you can affect it and it can affect you, that's interaction. If you're going to have no interaction, you might as well not have created it; any method of getting answers from it about your questions is interacting with it. The moment it suspects what it going on, it can start trying to play you, to get out of the box.

I'm at a loss to imagine how they would take over the world.

This is a really bad argument for safety. It's what the scientist says of his creation in sci-fi B-movies, shortly before the monster/plague/AI/alien/nanogoo escapes.

Comment author: ZoltanBerrigomo 12 January 2016 08:03:15PM *  0 points [-]

These are good points. Perhaps I should not have said "interact" but chosen a different word instead. Still, its ability to play us is limited since (i) we will be examining the records of the world after it is dead (ii) it has no opportunity to learn anything about us.

Edit: we might even make it impossible for it to game us in the following way. All records of the simulated world are automatically deleted upon completion -- except for a specific prime factorization we want to know.

This is a really bad argument for safety.

You are right, of course. But you wrote that in response to what was a parenthetical remark on my part -- the real solution is to use program checking to make sure the laws of physics of the simulated world are never violated.

Comment author: Manfred 12 January 2016 01:42:17AM 1 point [-]

Even if the huge computing or algorithmic advances needed fell out of the sky tomorrow, this scheme still doesn't seem like it solves the problems we really want it to solve, because it does not allow the the agents to learn anything interesting about our world.

Comment author: ZoltanBerrigomo 12 January 2016 05:59:07AM *  0 points [-]
  1. When talking about dealing and (non)interacting with real AIs, one is always talking about a future world with significant technological advances relative to our world today.

  2. If we can formulate something as a question about math, physics, chemistry, biology, then we can potentially attack it with this scheme. These are definitely problems we really want to solve.

  3. Its true that if we allow AIs more knowledge and more access to our world, they could potentially help us more -- but of course the number of things that can go wrong has to increase as well. Perhaps a compromise which sacrifices some of the potential while decreasing the possibilities that can go wrong is better.

What can go wrong with the following protocol for AI containment?

0 ZoltanBerrigomo 11 January 2016 11:03PM

1. Keep the AI in a box and don't interact with it. 

Don't have any conversations with it whatsoever. 

I'll explain how this would work in a second, but first some motivation. 

2. Consider all the great scientific discoveries over the course of human history -- Newtonian mechanics, relativity, algorithms for primality testing, etc etc. 

Could a "God" who is ``bad at math'' have created the universe in order to have us make these discoveries for him? 

It sounds insane: surely any being with the power to create our universe would know how to check if a number is prime.

3. But perhaps it isn't insane. Consider that at this point in its history, we

a) do not know how to factor numbers efficiently.

b) can create rudimentary simulated worlds (e.g., world of warcraft or your favorite MMO).

4. Here is how the scheme could work in more detail. 

Imagine your typical world of warcraft server, but each orc and human is controlled by a trained neural network of complexity roughly the same as the average human.

The simulated world does not have enough natural food for orc or human populations to thrive. The life of man and orc is, in the words of Hobbes, "poor, nasty, brutish, and short." 

But it does contain an ocean which washes up pebbles in the shore. Pebbles come in two types, red and black. The black pebbles have labels, which are randomly assigned from the set "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "x", "=". The red pebbles have strings of the same symbols written on them.

When you arrange the following sequence of pebbles  

2x3=6

(all in black up to and including the "=" and the last "6" is red) 

they crack open, and inside is enough food to feed a village for a year.

On the other hand, 2x3=5 has no effect.

The longer the prime factorization, the more food is in the pebbles.

No other way to open the pebbles exists.

Once you have arranged 2x3=6, the next time you arrange it with new pebbles gives you 99% as much food; the next time after that, 99% of 99%, and so on. 

5.  Societies in this simulated world will devote a lot of effort to figuring out the prime factorizations of the red numbers which wash up on the shore.

We program things so that, over time, length of the numbers on the red pebbles which get washed up on the shore gets longer and longer.  

Since computers are so much faster than humans, and the orcs/humans in this world are about as smart as we are, we might let this society run for a few million simulated human-years before examining the data dump from it. If a polynomial time algorithm for factoring primes exist, they will likely have found it. 

6. Given more difficult problems we want solved, we can imagine other ways of embedding them into the physics of the simulated world. Consider, for example, how the Schrodinger equation is embedded into the fabric of our universe.  

7. My question is: what can go wrong?

I can think of a couple of potential issues and am wondering whether I missed anything.

8. One problem is, of course, that someone might misuse the technology, running this experiment while breaking the protocol and telling the simulated people facts about our world. Of course, this is a generic problem with any protocol for any problem (bad things happen if you don't follow it).  

A related problem is that we might have activists who advocate for the rights of simulated man/orc; such activists might interrupt one of the experiments halfway through and break the protocol. 

We might lessen the risk of this by, say, making each experiment take less than one second of real time. Given that the current speed of the top supercomputer in the world is on the order of 10^16 flops per second, and projecting that into the future, one second might equal millions of years of simulated time. 

This is actually a good idea also for the following reason: the simulated people might realize that they are in a simulated world.

They might spend a lot of effort begging us to give them immortality or more pebbles. We might give in if we heard them. Best to make it impossible for us to hear their pleas. 

They will eventually cease trying to contact us once they realize no one is listening.

Ethically, I think one could justify all this. It is hard to argue, for example, that we (real human beings) have been harmed by being brought into existence in a universe without a God who is listening; almost all of us would prefer to be alive rather not. The same would go for them: surely, their simulated existence, imperfect as it is,  is not worse than not having been brought into the world in the first place? 

They might become discouraged once they receive no answers from us, but I think this is not inevitable. Suppose, for example, you come to believe the real world has been created for the purpose of solving math problems; most of us would be shaken by this but likely would go on to live our lives as before. 

Anyway, if we wanted to avoid all this we might consider more subtle ways of embedding problems into the rules of their world. 

9. Perhaps the most serious problem might be if there is a bug in our code for the simulation. 

In the worst case, they might be able to come up with some kind of exploit that lets them see anything stored on the computer running the simulation, in particular the source code of their universe. One might imagine they would even be able to modify said source code. 

One possible way to solve this would to write the simulation with the aid of some kind of program-checking software that ensures things like this will not happen.  Already, people are playing around with tools that enable you to write code which comes with a guarantee that the program will not enter into certain states. 

Still, even if a bug were to occur, it seems it would be quite difficult for them to take over the world -- though perhaps not impossible. I don't think they could obtain knowledge of the hardware they are running on -- only the software (people who know computer architecture, correct me if I'm wrong?). And they don't know the physics of our world.  If the computer which runs the simulation is not connected to any network -- say it is stored underground in a soundproof vault -- I'm at a loss to imagine how they would take over the world. 

Edit: note that the last paragraph was a parenthetical remark -- the real solution is to use program checking to ensure no violation of the physical laws of their world happens.

10. Why would we do any of this relative to, say, mind uploads which run at many times the speed of ordinary humans? Because an uploaded mind knows quite a bit about human psychology and the physics of the world, and could potentially use this knowledge in ways harmful to us.  

11. One can view this scheme as a kind of genetic algorithm (in a sense, it is a genetic algorithm). 

12. Finally: many people have written about friendly AI, and I've not been able to read everything (or even most) of what is written. Apologies if something like this has been discussed at length before -- in which case, I'd appreciate a pointer to the relevant discussions. 

The most relevant thing I've seen here is That Alien Message, which is really close to item 9, but still imagines interaction with the AI in the box, giving the AI an opportunity to convince us to let it use the internet. 

Update: Let me say something about why the simulated beings will not be able to figure out the physical laws of our world (though they might have a range of plausible guesses about it). 

Imagine you live in a world governed by Newtonian mechanics: every experiment you do is perfectly, 100% explained by Newton's laws. You come to believe you live in a simulation intended to force you to solve second order differential equations. 

What can you deduce about the physics of the real world?

Answer: nothing. Newton's laws is the only information you have. And that is just not enough information. 

Perhaps the real world is governed by Newton's laws just like yours. Or maybe it has general relativity (you are smart enough to figure out that is a possibility). Or maybe it is governed by F=m x'''. Electromagnetism might exist or it might not. Ditto for quantum mechanics. The real world could be discrete of continuous. How big could g, G, and Planck's constant be in the real world? Is there another force like gravity which roughly acts on the same scale? In fact, the possibilities are endless. 

You just don't know. 

Now suppose you are actually a superintelligence. You are really, really smart. What can you infer about the laws of physics which govern real world?

The answer, I believe, the same as above. 

A superintelligence may be smart, but its abilities are not without limit. It cannot solve a problem for which there is not enough information to solve it. 

Consider that world of warcraft feels the same an orc regardless of whether it runs on a pc or a mac of a quantum computer. Just experiencing a world with Newton's laws will not reveal any information about the hardware your universe is running on.

I do acknowledge, however, that the super-intelligence might come up with a large list of possible guesses about what our physics could be. 

Update 2: If it sounds to you like the AIs might take over our world with this scheme, consider the following scenario.

Suppose that we terminate the simulation after a million years. We have a particular second order differential equation in mind before starting the simulation. After termination, every record of the simulated world is destroyed, except for any time someone has written a solution to this specific differential equation on a piece of paper. This is done automatically by the code. An automatic differentiation algorithm checks whether the solution is correct. If it is, it shows us the solution; else, it deletes it along with the rest of the record of the world. 

The output is either nothing, or a solution to the differential equation we wanted solved. 

(of course, this is a toy scenario; more realistically, we will want a solution to a more difficult scientific problem)

Assuming no bugs in the code, can you give an account of how a superintelligence could take over the real world in this scenario?

Comment author: ChristianKl 09 January 2016 11:13:05AM 0 points [-]

but this does not change the fundamental fact that being rational involves evaluating claims like "is 1+1=2?" or empirical facts about the world such as "is there evidence for the existence of ghosts?" based on reason alone.

On of the claims is analytic. 1+1=2 is true by definition of what 2 means. There's little emotion involved.

When it comes to an issue such as is there evidence for the existence of ghosts? neither rationality after Eliezer's sequences nor CFAR argues that emotions play no role. Noticing when you feel the emotion of confusion because your map doesn't really fit is important.

Beauty of mathematical theories is a guiding stone for mathematicians.

Basically any task that doesn't need emotions or intuitions is better done by computers than by humans. To the extend that human's outcompete computers there's intuition involved.

Comment author: ZoltanBerrigomo 11 January 2016 03:34:17AM 1 point [-]

1+1=2 is true by definition of what 2 means

Russell and Whitehead would beg to differ.

Comment author: Kaj_Sotala 09 January 2016 01:06:37PM 3 points [-]

Being rational involves evaluating various claims and empirical facts, using the best evidence that you happen to have available. Sometimes you're dealing with a domain where explicit reasoning provides the best evidence, sometimes with a domain where emotions provide the best evidence. Both are information-processing systems that have evolved to make sense of the world and orient your behavior appropriately; they're just evolved for dealing with different tasks.

This means that in some domains explicit reasoning will provide better evidence, and in some domains emotions will provide better evidence. Rationality involves figuring out which is which, and going with the system that happens to provide better evidence for the specific situation that you happen to be in.

Comment author: ZoltanBerrigomo 11 January 2016 03:28:40AM *  1 point [-]

Sometimes you're dealing with a domain where explicit reasoning provides the best evidence, sometimes with a domain where emotions provide the best evidence.

And how should you (rationally) decide which kind of domain you are in?

Answer: using reason, not emotions.

Example: if you notice that your emotions have been a good guide in understanding what other people are thinking in the past, you should trust them in the future. The decision to do this, however, is an application of inductive reasoning.

Comment author: ChristianKl 04 January 2016 12:24:53PM *  2 points [-]

Being rational means many things, but surely one of them is making decisions based on some kind of reasoning process as opposed to recourse to emotions.

No. CFAR rationality is about aligning system I and system II. It's not about declaring system I outputs to be worthy of being ignored in favor of system II outputs.

You might, for example, have very strong emotions about matters pertaining to fights between your perceived in-group and out-group, but you try to put those aside and make judgments based on some sort of fundamental principles.

The alternative is working towards feeling more strongly for the fundamental principles than caring about the fights.

emotions are not easy to fake and humans have strong intuitions about whether someone's expressed feelings are genuine.

A person who cares strongly for his cause doesn't need to fake emotions.

Comment author: ZoltanBerrigomo 08 January 2016 11:55:50PM *  0 points [-]

No. CFAR rationality is about aligning system I and system II. It's not about declaring system I outputs to be worthy of being ignored in favor of system II outputs.

I believe you are nitpicking here.

If your reason tells you 1+1=2 but your emotions tell you that 1+1=3, being rational means going with your reason. If your reason tells you that ghosts do not exist, you should believe this to be the case even if you really, really want there to be evidence of an afterlife.

CFAR may teach you techniques to align your emotions and reason, but this does not change the fundamental fact that being rational involves evaluating claims like "is 1+1=2?" or empirical facts about the world such as "is there evidence for the existence of ghosts?" based on reason alone.

Just to forestall the inevitable objections (which always come in droves whenever I argue with anyone on this site): this does not mean you don't have emotions; it does not mean that your emotions don't play a role in determining your values; it does not mean that you shouldn't train your emotions to be an aid in your decision-making, etc etc etc.

Comment author: ChristianKl 04 January 2016 12:24:53PM *  2 points [-]

Being rational means many things, but surely one of them is making decisions based on some kind of reasoning process as opposed to recourse to emotions.

No. CFAR rationality is about aligning system I and system II. It's not about declaring system I outputs to be worthy of being ignored in favor of system II outputs.

You might, for example, have very strong emotions about matters pertaining to fights between your perceived in-group and out-group, but you try to put those aside and make judgments based on some sort of fundamental principles.

The alternative is working towards feeling more strongly for the fundamental principles than caring about the fights.

emotions are not easy to fake and humans have strong intuitions about whether someone's expressed feelings are genuine.

A person who cares strongly for his cause doesn't need to fake emotions.

Comment author: ZoltanBerrigomo 05 January 2016 05:51:22AM *  1 point [-]

Sure, you can work towards feeling more strongly about something, but I don't believe you'll ever be able match the emotional fervor the partisans feel -- I mean here the people who stew in their anger and embrace their emotions without reservations.

As a (rather extreme) example, consider Hitler. He was able to sway a great many people with what were appeals to anger and emotion (though I acknowledge there is much more to the phenomena of Hitler than this). Hypothetically, if you were a politician from the same era, say a rational one, and you understood that the way to persuade people is to tap into the public's sense of anger, I'm not sure you'd be able to match him.

Comment author: ChristianKl 02 January 2016 09:25:21PM 4 points [-]

I would suggest it requires more than just rationality + competence + caring; for one thing, it requires a little bit of luck

Do the extend that it does require luck that simply means that it's important to have more people with rationality + competence + caring. If you have many people some will get lucky.

Many such people respond to unreasonable confidence

I think the term "unreasonable confidence" can be misleading. It's possible to very confidently say "I don't know".

At the LW Community Camp in Berlin, I consider Valentine of CFAR to have been the most charismatic person in attendence. When speaking with Valentine, he said things like: "I think it's likely that what you are saying is true, but I don't see a reason why it has to be true." He also very often told people that he might be wrong and that people shouldn't trust his judgements as strongly as they do.

may be more difficult to produce the more you are used to thinking things through rationally.

I think you might be pattern matching to straw-vulcan rationality, that's distinct from what CFAR wants to teach.

Comment author: ZoltanBerrigomo 03 January 2016 05:42:10PM *  1 point [-]

Do the extend that it does require luck that simply means that it's important to have more people with rationality + competence + caring. If you have many people some will get lucky.

The "little bit of luck" in my post above was something of an understatement; actually, I'd suggest it requires a lot of luck (among many other things) to successfully change the world.

I think you might be pattern matching to straw-vulcan rationality, that's distinct from what CFAR wants to teach.

Not sure if I am, but I believe I am making a correct claim about human psychology here.

Being rational means many things, but surely one of them is making decisions based on some kind of reasoning process as opposed to recourse to emotions.

This does not mean you don't have emotions.

You might, for example, have very strong emotions about matters pertaining to fights between your perceived in-group and out-group, but you try to put those aside and make judgments based on some sort of fundamental principles.

Now if, in the real world, the way you persuade people is by emotional appeals (and this is at least partially true), this will be more difficult the more you get in the habit of rational thinking, even if you have an accurate model about what it takes to persuade someone -- emotions are not easy to fake and humans have strong intuitions about whether someone's expressed feelings are genuine.

In response to Why CFAR's Mission?
Comment author: ZoltanBerrigomo 01 January 2016 09:10:11PM *  4 points [-]

A very interesting and thought provoking post -- I especially like the Q & A format.

I want to quibble with one bit:

How can I tell there aren't enough people out there, instead of supposing that we haven't yet figured out how to find and recruit them?

Basically, because it seems to me that if people had really huge amounts of epistemic rationality + competence + caring, they would already be impacting these problems. Their huge amounts of epistemic rationality and competence would allow them to find a path to high impact; and their caring would compel them to do it.

There is an empirical claim about the world that is implicit in that statement, and it is this claim I want to disagree with. Namely: I think having a high impact on the world is really, really hard. I would suggest it requires more than just rationality + competence + caring; for one thing, it requires a little bit of luck.

It also requires a good ability to persuade others who are not thinking rationally. Many such people respond to unreasonable confidence, emotional appeals, salesmanship, and other rhetorical tricks which may be more difficult to produce the more you are used to thinking things through rationally.

Comment author: ZoltanBerrigomo 01 November 2015 07:02:07PM *  2 points [-]

For those people who insist, however, that the only thing that is important is that the theory agrees with experiment, I would like to make an imaginary discussion between a Mayan astronomer and his student...

These are the opening words of a ~1.5 minute monologue in one of Feynman's lectures; I won't transcribe the remainder but it can be viewed here.

View more: Prev | Next