ZoltanBerrigomo — LessWrong

What can go wrong with the following protocol for AI containment?

If you limit yourself to a subset of features such that you are no longer writing in a format which is turing complete then you may be able to have a program capable of automatically proving that code reliably.

Right, that is what i meant.

What can go wrong with the following protocol for AI containment?

ZoltanBerrigomo10y00

Here is my attempt at a calculation. Disclaimer: this is based on googling. If you are actually knowledgeable in the subject, please step in and set me right.

There are 10^11 neurons in the human brain.

A neuron will fire about 200 times per second.

It should take a constant number of flops to decide whether a neuron will fire -- say 10 flops (no need to solve a differential equation, neural networks usually use some discrete heuristics for something like this)

I want a society of 10^6 orcs running for 10^6 years

As you suggest, lets let the simulation run for a year of real time (moving away at this point from my initial suggestion of 1 second). By my calculations, it seems that in order for this to happen we need a computer that does 2x10^25 flops per second.

According to this

http://www.datacenterknowledge.com/archives/2015/04/15/doe-taps-intel-cray-to-build-worlds-fastest-supercomputer/

...in 2018 we will have a supercomputer that does about 2x10^17 flops per second.

That means we need a computer that is one hundred million times faster than the best computer in 2018.

That is still quite a lot, of course. If Moore's law was ongoing, this would take ~40 years; but Moore's law is dying. Still, it is not outside the realm of possibility for, say, the next 100 years.

Edit: By the way, one does not need to literally implement what I suggested -- the scheme I suggested is in principle applicable whenever you have a superintelligence, regardless of how it was designed.

Indeed, if we somehow develop an above-human intelligence, rather than trying to make sure its goals are aligned with ours, we might instead let it loose within a simulated world, giving it a preference for continued survival. Just one superintelligence thinking about factoring for a few thousand simulated years would likely be enough to let us factor any number we want. We could even give it have in-simulation ways of modifying its own code.

What can go wrong with the following protocol for AI containment?

ZoltanBerrigomo10y00

Still inability to realise what you are doing seems rather dangerous.

So far, all I've done is post a question on lesswrong :)

More seriously, I do regret it if I appeared unaware of the potential danger. I am of course aware of the possibility that experiments with AI might destroy humanity. Think of my post above a suggesting a possible approach to investigate -- perhaps one with some kinks as written (that is why I'm asking a question here) but (I think) with the possibility of one day having rigorous safety guarantees.

What can go wrong with the following protocol for AI containment?

ZoltanBerrigomo10y00

I think this calculation too conservative. The reason is (as I understand it) that neurons are governed by various differential equations, and simulating them accurately is a pain in the ass. We should instead assume that deciding whether a neuron will fire will take a constant number of flops.

I'll write another comment which attempts to redo your calculation with different assumptions.

It seems to me that by the time we can do that, we should have figured out a better way to create AI.

But will we have figured a way to reap the gains of AI safely for humanity?

What can go wrong with the following protocol for AI containment?

ZoltanBerrigomo10y00

but more than that, it knows that the real universe is capable of producing intelligent beings that chose this particular world to simulate.

Good point -- this undermines a lot of what I wrote in my update 1. For example, I have no idea if F = m d^3 x / dt would result in a world that is capable of producing intelligent beings.

I should at some point produce a version of the above post with this claim, and other questionable parenthetical remarks I made, deleted, or at least acknowledging that they require further argumentation; they are not necessary for the larger point, which is that as long as the only thing the superintelligence can do (by definition) is live in a simulated world governed by Newton's laws, and as long as we don't interact with it at all except to see an automatically verified answer to a preset question (e.g., factor "111000232342342"), there is nothing it can do to harm us.

What can go wrong with the following protocol for AI containment?

ZoltanBerrigomo10y00

I guess I am willing to bite the bullet and say that, as long as entity X prefers existence to nonexistence, you have done it no harm by bringing it into being. I realize this generates a number of repulsive-sounding conclusions, e.g., it becomes ethical to create entities which will live, by our 21st century standards, horrific lives.

At least some of them will tell you they had rather not been born.

If one is willing to accept my reasoning above, I think one can take one more leap and say that statistically as long as the vast majority of these entities will prefer existing to never having been brought into being, we are in the clear.

What can go wrong with the following protocol for AI containment?

ZoltanBerrigomo10y00

The theorem you cite (provided I understood you correctly) does not preclude the possibility of checking whether a program written in a certain pre-specified format will have bugs. Bugs here are defined to be certain undesirable properties (e.g., looping forever, entering certain enumerated states, etc).

Baby versions of such tools (which automatically check whether your program will have certain properties from inspecting the code) already exist.

What can go wrong with the following protocol for AI containment?

ZoltanBerrigomo10y00

Nice idea...I wrote an update to the post suggesting what seemed to me to be a variation on your suggestion.

About program checking: I agree completely. I'm not very informed about the state of the art, but it is very plausible that what we know right now is not yet up to task.

What can go wrong with the following protocol for AI containment?

ZoltanBerrigomo10y00

I'm a bit skeptical about whether you can actually create a superintelligent AI by combining sped up humans like that,

Why not? You are pretty smart, and all you are is a combination of 10^11 or so very "dumb" neurons. Now imagine a "being" which is actually a very large number of human-level intelligences, all interacting...

What can go wrong with the following protocol for AI containment?

ZoltanBerrigomo10y10

is very, very difficult not to give a superintelligence any hints of how the physics of our world work.

I wrote a short update to the post which tries to answer this point.

Maybe they notice minor fluctuations in the speed of the simulation based on environmental changes to the hardware

I believe they should have no ability whatsoever to detect fluctuations in the speed of the simulation.

Consider how the world of world of warcraft appears to an orc inside the game. Can it tell the speed at which the hardware is running the game?

It can't. What it can do is compare the speed of different things: how fast does an apple fall from a tree vs how fast a bird flies across the sky.

The orc's inner perception of the flow of time is based on comparing these things (e.g., how fast does an apple fall) to how fast their simulated brains process information.

If everything is slowed down by a factor of 2 (so you, as a player, see everything twice is slow), nothing appears any different to a simulated being within the simulation.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments