Taking a look at the latest here after a hiatus, I notice there is once again a lot of discussion about the problem of AI safety, clearly a cause for concern to people who believe it to be an existential threat.

I personally think AI safety is not an existential threat, not because I believe the AI alignment problem is easier than Eliezer et al do, but because I believe AGI is much harder. I was involved in some debate on that question, a while back, but neither side was able to convince the other. I now think that's because it's unprovable; given the same data, the answer relies too heavily on intuition. Instead, I look for actions that are worth taking regardless of whether AGI is easy or hard.

One thing I do consider a matter for grave concern is the call to address the issue by shutting down AI research, and progress on computing in general. Of course there are short-term problems with this course of action, such as that it is, if implemented, much more likely to be enforced in democracies than dictatorships, which is very much not an outcome we should want.

The long-term problem with shutting down progress is that at very best, it just exchanges one form of suicide for another. Death is the default. Without progress, we remain trapped in a sealed box, wallowing in our own filth until something more mundane like nuclear war or pandemic, puts an end to our civilization. Once that happens, it's game over. Even if our species survives the immediate disaster, all the easily accessible fossil fuel deposits are gone. There will be no new Renaissance and Industrial Revolution. We'll be back to banging the rocks together until evolution finds a way to get rid of the overhead of general intelligence, then the sun autoclaves what's left of the biosphere and the unobserved stars spend the next ten trillion years burning down to cinders.

(Unless AI alignment is so easy that humans can figure it out by pure armchair thought, in the absence of actually trying to develop AI. But for that to be the case, it would have to be much easier than many technical problems we've already solved. And if AI alignment were that easy, there would be no call for concern in the first place.)

It is, admittedly, not as though there is no ground for pessimism. The problem of AI alignment, as conceived by default, is impossible. That's certainly ground for pessimism!

The default way to think about it is straightforward. Friendliness is a predicate, a quality that an AI has or lacks. A function from a system to a Boolean. (The output could be more complex; it doesn't change the conclusion.) The input is an AI; the output is a Boolean.

The problem – or, let's say, far from the least of the problems – with this formulation is that the function Friendly(AI) is undecidable. Proof: straightforward application of Rice's theorem.

On the face of it, this proves too much; Rice's theorem would seem to preclude writing any useful software. The trick is, of course, that we don't start with an arbitrary program and try to prove it does what we want. We develop the software along with the understanding of why it does what we want and not what we don't want, and preferably along with mechanical verification of some relevant properties like absence of various kinds of stray pointer errors. In other words, the design, implementation and verification all develop together.

This is not news to anyone who has worked in the software industry. The point is that – if and to the extent it exists at all – AI is software, and is subject to the same rules as any other software project: if you want something that reliably does what you want, design, implementation and verification need to go together. Put that way, it sounds obvious, but it's easy to miss the implications.

It means there is no point trying to create a full-blown AGI by running an opaque optimization process – a single really big neural network, say, or a genetic algorithm with a very large population size – on a lot of hardware, and hoping something amazing jumps out. If I'm right about the actual difficulty of AGI, nothing amazing will happen, and if Eliezer et al are right and brute-force AGI is relatively easy, the result won't have the safety properties you want. (That doesn't mean there aren't valuable use cases for running a neural network on a lot of hardware. It does mean 'if only we can throw enough hardware at this, maybe it will wake up and become conscious' is not one of them.)

It means there is no point trying to figure out how to verify an arbitrarily complex, opaque blob after the fact. You can't. Verification has to go in tandem with design and implementation. For example, from https://www.alignmentforum.org/posts/QEYWkRoCn4fZxXQAY/prizes-for-elk-proposals

We suspect you can’t solve ELK just by getting better data—you probably need to “open up the black box” and include some term in the loss that depends on the structure of your model and not merely its behavior.

Yes. Indeed, this is still an understatement; 'open up the black box' is easy to interpret as meaning that you start off by being given a black box, and then begin to think about how to open it up. A better way to look at it is that you need to be thinking about how to figure out what's going on in the box, in tandem with building the box in the first place.

It means there is no point trying to solve the alignment problem by pure armchair thought. That would be like expecting Babbage and Lovelace to deduce Meltdown/Spectre. It's not going to happen in the absence of actually designing and building systems.

It means suppressing development and shutting down progress is suicide. Death and extinction are the default outcomes. Whatever chance we have of turning our future light cone into a place where joy and wonder exist, depends on making continued progress – quickly, before the window of opportunity slams shut.

'Design, implement and verify' sounds more difficult than just trying to do one of these things in isolation, but that's an illusion. All three activities are necessary parts of the job, each depends on the others, and none will be successfully accomplished in isolation.

New Comment
13 comments, sorted by Click to highlight new comments since:

The long-term problem with shutting down progress is that at very best, it just exchanges one form of suicide for another. Death is the default. Without progress, we remain trapped in a sealed box, wallowing in our own filth until something more mundane like nuclear war or pandemic, puts an end to our civilization. Once that happens, it's game over. Even if our species survives the immediate disaster, all the easily accessible fossil fuel deposits are gone. There will be no new Renaissance and Industrial Revolution. We'll be back to banging the rocks together until evolution finds a way to get rid of the overhead of general intelligence, then the sun autoclaves what's left of the biosphere and the unobserved stars spend the next ten trillion years burning down to cinders.

What is being proposed is a small pause in progress in a particularly dangerous field. We can still go on and develop fusion reactors or whatever. We could in principle have a high tech utopian space fairing society, spreading throughout the universe, all designed by human minds with no AI. I mean we probably won't. What is actually being proposed is more like a 20 year pause in AI research to let MIRI solve alignment, and we probably won't get that.

 Even if our species survives the immediate disaster, all the easily accessible fossil fuel deposits are gone. There will be no new Renaissance and Industrial Revolution. We'll be back to banging the rocks together until evolution finds a way to get rid of the overhead of general intelligence,

I think we would probably do something else. Modern crops have been selectively bread and genetically engineered to be much more productive. Thus biofuels would be a better option. Given even a physics textbook, nuclear could be invented quite a bit earlier. (And in a post disaster scenario, their will be a chance to scavenge for fresh nuclear fuel) There is wind power, and hydro power, which become more attractive when you already have sophisticated neodymium magnet generators laying around, or dams already built.

We wouldn't be able to copy the exact route of the industrial revolution. We will find a different route back to high technology. Humans developing tech is like a stream running downhill. Block the easiest route, and the second easiest route is taken. Maybe in economic growth terms, the period takes 500 years, not 200 or something. Maybe its faster because people don't need to invent most of the stuff, just make it.

Pre industrial revolution tech includes stuff like ironwork and pottery, and quite a few other tasks that are complex enough to require intelligence, and useful enough to reward it. 

Unless AI alignment is so easy that humans can figure it out by pure armchair thought, in the absence of actually trying to develop AI.

Fermats last theorem was proved by "pure armchair thought", that doesn't make it "so easy". Suppose AI alignment was a really tricky mathematical problem. Maybe trying to build dumb AI doesn't help, because the behaviour of dumb AI is totally different to the behaviour of superintelligences. And any AI smart enough to give you useful info on superintelligence is smart enough to kill you. 

The default way to think about it is straightforward. Friendliness is a predicate, a quality that an AI has or lacks. A function from a system to a Boolean. (The output could be more complex; it doesn't change the conclusion.) The input is an AI; the output is a Boolean.

The problem – or, let's say, far from the least of the problems – with this formulation is that the function Friendly(AI) is undecidable. Proof: straightforward application of Rice's theorem.

This logic proves too much. It can also applies just as well to the property "adds 2 numbers". This is true. There are some incredibly complicated pieces of code, where it really is impossible to tell if that code adds numbers. 

You can test it on a particular input, but maybe it works on those inputs, but fails on some other input.

def complicated_add(a,b):

    T=get_turing_machine()

    for i in range(a):

         T.step()

    if T.has_halted():

        return a+b-1

    else:

        return a+b

This code is a valid program to add 2 numbers, if and only if T runs forever. 

But we don't want to classify all possible programmes into "adds 2 numbers" or "doesn't add" when building a calculator. We want to find a single program that does add. This is easy. Sometimes we want to verify that a particular algorithm works, this is usually possible, because no one writes software whose behaviour depends on arbitrary turing machines like that.

If you restrict to programs that fit in the universe, rices theorem vanishes. 

Likewise, we don't want to classify all possible programs into friendly or unfriendly. 

On the face of it, this proves too much; Rice's theorem would seem to preclude writing any useful software. The trick is, of course, that we don't start with an arbitrary program and try to prove it does what we want. We develop the software along with the understanding of why it does what we want and not what we don't want, and preferably along with mechanical verification of some relevant properties like absence of various kinds of stray pointer errors. In other words, the design, implementation and verification all develop together.

This also seems to prove too much. Euclid's algorithm was developed and understood as a piece of mathematics long before computers were invented. 

This is not news to anyone who has worked in the software industry. The point is that – if and to the extent it exists at all – AI is software, and is subject to the same rules as any other software project: if you want something that reliably does what you want, design, implementation and verification need to go together. Put that way, it sounds obvious, but it's easy to miss the implications.

You have jumped from formal maths about turing machines that doesn't apply much in reality. And gone straight to advice from a project management book about having your programmers and software testers working in the same office. You seem to imply these are related somehow. The project management bit may or may not be good advice  for the typical software project. It may or may not be good advice for an AI project. It isn't some universal mathematical truth.

"AI is software, and is subject to the same rules as any other software project" AI is technically a software project, but there are some fairly good reasons to expect it to be different from the typical software project. 

It means there is no point trying to create a full-blown AGI by running an opaque optimization process – a single really big neural network, say, or a genetic algorithm with a very large population size – on a lot of hardware, and hoping something amazing jumps out. If I'm right about the actual difficulty of AGI, nothing amazing will happen, and if Eliezer et al are right and brute-force AGI is relatively easy, the result won't have the safety properties you want.

Are you arguing that evolution and gradient descent are so dumb, and AGI is so hard, that no matter how much brute force you throw at the problem, evolution will never create an AGI. (Despite biological evolution producing humans) 

I see no fundamental reason why we couldn't create an AGI like this, and by  reasonably sure it would be friendly due to clever choice of initialization and optimization criteria or transparency tools.

It means there is no point trying to figure out how to verify an arbitrarily complex, opaque blob after the fact. You can't. Verification has to go in tandem with design and implementation.

There is no way to verify every possible program after the fact, if you want a verifier to never make any mistakes in either direction.

If you are ok with your verifier occasionally saying "I don't know", but only on really contrived problems that don't appear in real life, then verification is possible. 

And "big neural nets" aren't arbitrary programs. The net has all sorts of structure to use. 

 

It means there is no point trying to solve the alignment problem by pure armchair thought. That would be like expecting Babbage and Lovelace to deduce Meltdown/Spectre. It's not going to happen in the absence of actually designing and building systems.

I think these systems were largly designed by humans, and I think those bugs were largely found by humans. In other words, if you got all the chip designers at intel working to produce designs, with no actual silicon being produced, and all the modern security researcher trying to pick holes in the designs, then yes these bugs would have been found entirely by theory. (Of course, no one would pay all these people to do such a useless activity)  

It means suppressing development and shutting down progress is suicide. Death and extinction are the default outcomes. Whatever chance we have of turning our future light cone into a place where joy and wonder exist, depends on making continued progress – quickly, before the window of opportunity slams shut.

Suppose we decided that deepmind wasn't verifying properly. Should we shut them down to let someone who will do the verification try to make AGI.

Suppose every country agreed to total disarmament. The biologists produced one vaccine that stops everything. We build some really good telescopes and several asteroid deflection rockets. Almost all AI research around the globe is shutdown. MIRI is given special permission to carry on, and given the instructions "take your time and do things safe not fast. If it takes a thousand years for you to create aligned AI, that's fine." Do you expect this world to have a better chance at a good long term future than the status quo reality. I do.

 

'Design, implement and verify' sounds more difficult than just trying to do one of these things in isolation, but that's an illusion. All three activities are necessary parts of the job, each depends on the others, and none will be successfully accomplished in isolation.

MIRI produced a paper on logical induction. This paper proposed an algorithm that was too slow to run in practice. They formally described the algorithm, but didn't program it in any particular language. MIRI managed to mathematically prove some interesting properties of this algorithm. Design and verification without implementation. When producing a really small simple program, I can get it right first try. Design and implementation without verification. 

You also haven't said exactly what each category is. If I write pseudocode. Is that implementation or just design. If I pepper my code with assert statements is that part of verification or just implementation? Does it matter if I haven't run the code yet? Is this supposed to prevent humans writing code entirely by pencil and paper. (Which some humans can do)  

What is actually being proposed is more like a 20 year pause in AI research to let MIRI solve alignment

Isn't that insanely unrealistic? Both A.) unrealistic in achieving that pause, and B.) just letting MIRI solve alignment in 20 years? MIRI was formed back in 2000, 22 years ago, and now global AI research has to pause for two decades so MIRI can write more papers about Pearlian Causal Inference? 

What is being proposed is a small pause in progress in a particularly dangerous field.

There are no small pauses in progress. Laws, and the movements that drive them, are not lightbulbs to be turned on and off at the flick of a switch. You can stop progress, but then it stays stopped. The Qeng Ho fleets, for example, once discontinued, did not set sail again twenty years later, or two hundred years later.

There also tend not to be narrow halts in progress. In practice, a serious attempt to shut down progress in AI, is going to shut down progress in computers in general, and they're an important enabling technology for pretty nearly everything else.

I mean we probably won't. What is actually being proposed is more like a 20 year pause in AI research to let MIRI solve alignment, and we probably won't get that.

If you think any group of people, no matter how smart and dedicated, can solve alignment in twenty years of armchair thought, that means you think the AI alignment problem is, on the scale of things, ridiculously easy.

I'm asking you to stop and think about that for a moment.

AI alignment is ridiculously easy.

Is that really something you actually believe? Do you actually think the evidence points that way?

Or do you just think your proposed way of doing things sounds more comfortable, and the figure of twenty years sounds comfortably far enough in the future that a deadline that far off does not feel pressing, but still sooner that it would be within your lifetime? These are understandable feelings, but unfortunately they don't provide any information about the actual difficulty of the problem.

I think we would probably do something else. Modern crops have been selectively bread and genetically engineered to be much more productive. Thus biofuels would be a better option.

Modern crops are productive given massive inputs of high-tech industry and energy in the form of things like artificial fertilizers, pesticides, tractors. Deprived of these inputs, we won't be able to feed ourselves, let alone have spare food to burn as fuel.

Given even a physics textbook, nuclear could be invented quite a bit earlier.

Actually no, the physics wasn't the gating factor for nuclear energy. One scientist in the 1930s remarked that sure, nuclear fission would work in principle, but to get the enriched uranium, you would have to turn a whole country into an enrichment facility. He wasn't that far wrong; the engineering resources and electrical energy the US put into the Manhattan project, were in the ballpark of what many countries could've mustered in total.

Maybe trying to build dumb AI doesn't help, because the behaviour of dumb AI is totally different to the behaviour of superintelligences. And any AI smart enough to give you useful info on superintelligence is smart enough to kill you.

Maybe the Earth is about to be demolished to make room for a hyperspace bypass. Maybe there's a short sequence of Latin words that summons Azathoth, and no way to know this until it's too late because no other sequence of Latin words has any magical effect whatsoever. It's always easy to postulate worlds in which we are dead no matter what we do, but not particularly useful; not only are those worlds unlikely, but by their very nature, planning what to do in those worlds is pointless. All we can usefully do is make plans for those worlds – hopefully a majority – in which there is a way forward.

Are you arguing that evolution and gradient descent are so dumb, and AGI is so hard, that no matter how much brute force you throw at the problem, evolution will never create an AGI. (Despite biological evolution producing humans)

I am arguing that it will never create an AGI with resources available to human civilization. Biological evolution took four billion years with a whole planet's worth of resources, and that still underestimates the difficulty by an unknown but large factor, because it took many habitable planets to produce intelligence on just one; the lower bound on that factor is given by the absence of any sign of starfaring civilizations in our past light cone; the upper bound could be in millions of orders of magnitude, for all we know.

Suppose every country agreed to total disarmament. The biologists produced one vaccine that stops everything. We build some really good telescopes and several asteroid deflection rockets. Almost all AI research around the globe is shutdown. MIRI is given special permission to carry on, and given the instructions "take your time and do things safe not fast. If it takes a thousand years for you to create aligned AI, that's fine." Do you expect this world to have a better chance at a good long term future than the status quo reality. I do.

Well, sure. By the time you've got universal consent to peace on Earth, and the existence of a single vaccine that stops all possible diseases, you've already established that you're living in the utopia section of the Matrix, so you can be pretty relaxed about the long-term future.  Unfortunately, that doesn't produce anything much in the way of useful policy guidance for those living in baseline reality.

When producing a really small simple program, I can get it right first try.

Sure. Hopefully we all understand that the operative words in that sentence are small and simple.

The contention is that we are in a new situation. Writing software is a new type of activity with different constraints than building a bridge; creating a general AI is a new type of activity with different constraints than writing software. In particular, the standard "waterfall doesn't work, agile/iteration is far better" solution to writing software is contended to fail when creating a general AI, specifically because the implement step produces disaster without sufficient design, rather than simply producing failure or waste.

You can argue that we're not in a new situation, but I don't think "it works far better for software ==> we have no choice but to do it for general AI" follows without arguing it.

My argument is not that AI is the same activity as writing a compiler or a search engine or an accounts system, but that it is not an easier activity, so techniques that we know don't work for other kinds of software – like trying to deduce everything by armchair thought, verify after-the-fact the correctness of an arbitrarily inscrutable blob, or create the end product by throwing lots of computing power at a brute force search procedure – will not work for AI, either.

so techniques that we know don't work for other kinds of software – like trying to deduce everything by armchair thought, verify after-the-fact the correctness of an arbitrarily inscrutable blob, or create the end product by throwing lots of computing power at a brute force search procedure

I am not sure what you mean when you say these techniques "don't work". They all seem to be techniques that sometimes produce something, given sufficient resources. They all seem like techniques that have produced something. Researchers have unpicked and understood all sorts of hacker written malware. The first computer program was written entirely by armchair thought, and programming in pencil and paper continues in some tech company interviews today. Brute force search can produce all sorts of things.

In conventional programming, a technique that takes 2x as much programmer time is really bad. 

In ASI programming, a technique that takes 2x as much programmer time and has 1/2 the chance of destroying the world is pretty good. 

There is a massive difference between a technique not working and a technique being way less likely to work.

A: 1% chance of working given that we get to complete it, doesn't kill everyone before completing

B: 10% chance of working given that we get to complete it, 95% chance of killing everyone before completing

You pick A here. You can't just ignore the "implement step produces disaster" bit. Maybe we're not in this situation (obviously it changes based on what the odds of each bit actually are), but you can't just assume we're not in this situation and say "Ah, well, B has a much higher chance of working than A, so that's all, we've gotta go with B".

Are you advocating as option A, 'deduce a full design by armchair thought before implementing anything'? The success probability of that isn't 1%. It's zero, to as many decimal places as makes no difference.

We're probably talking past each other. I'm saying "no you don't get to build lots of general AIs in the process of solving the alignment problem and still stay alive" and (I think) you're saying "no you don't get to solve the alignment problem without writing a ton of code, lots of it highly highly related to AI". I think both of those are true.

Right, yes, I'm not suggesting the iterated coding activity can or should include 'build an actual full-blown superhuman AGI' as an iterated step.

Of course there are short-term problems with this course of action, such as that it is, if implemented, much more likely to be enforced in democracies than dictatorships, which is very much not an outcome we should want.

There is a reason north Korea isn't in the lead with AI. 

I mean china is a bit more with it, but none of the recent breakthroughs have come from china (that I've seen). 

For whatever reason, dictatorships aren't very good at AI research. Maybe something about censorship, worse education, none of the top researchers wanting to work there. (If your choices were a cushy job at google writing non AI code, or north Koreas internationally condemned AI program, which would you pick? So this will somewhat hold even if all the democracies ban AI research. ) 

I am also not quite clear why north Korea destroying the world would be so much worse than deepmind doing it. 

I am also not quite clear why north Korea destroying the world would be so much worse than deepmind doing it.

I think the argument about this part would be that Deepmind is much more likely (which is not to say "likely" on an absolute scale) to at least take alignment seriously enough to build (or even use!) interpretability tools and maybe revise their plans if the tools show the AGI plotting to kill everyone. So by the time Deepmind is actually deploying an AGI (even including accidental "deployments" due to foom during testing), it's less likely to be misaligned than one deployed by North Korea.

Of course if (as the rest of your comment contends) North Korea can't develop an AGI anyway, this is a bit beside the point. It's much easier for lower-tier research organizations to copy and make minor modifications than do totally new things, so NK would be more likely to develop AGI to the extent that techniques leading to AGI are already available. Which would plausibly be the case if researchers in democracies are trying to get as close to the AGI line as is allowed (because that gives useful capabilities), which in turn seems much more plausible to me than democracies globally coordinating to avoid anything even vaguely close to AGI.

Imagine you are in charge of choosing how fast deep mind develops tech. Go too fast and you have a smaller chance of alignment. Go too slow and north Korea may beat you. 

There isn't much reason to go significantly faster than north Korea in this scenario.  If you can go a bit faster and still make something probably aligned, do that.

In a worse situation, taking your time and hoping for a drone strike on north Korea is probably the best bet. 

Which would plausibly be the case if researchers in democracies are trying to get as close to the AGI line as is allowed (because that gives useful capabilities), which in turn seems much more plausible to me than democracies globally coordinating to avoid anything even vaguely close to AGI.

Coordinating on a fuzzy boundary no one can define or measure is really hard. If coordination happens, it will be to avoid something simple, like any project using more than X compute.

I don't think Conceptually close to AGI = Profitable. There is simple dumb money making code. And there is code that contains all the ideas for AGI, but is missing one tiny piece, and so is useless.