Taking a look at the latest here after a hiatus, I notice there is once again a lot of discussion about the problem of AI safety, clearly a cause for concern to people who believe it to be an existential threat.
I personally think AI safety is not an existential threat, not because I believe the AI alignment problem is easier than Eliezer et al do, but because I believe AGI is much harder. I was involved in some debate on that question, a while back, but neither side was able to convince the other. I now think that's because it's unprovable; given the same data, the answer relies too heavily on intuition. Instead, I look for actions that are worth taking regardless of whether AGI is easy or hard.
One thing I do consider a matter for grave concern is the call to address the issue by shutting down AI research, and progress on computing in general. Of course there are short-term problems with this course of action, such as that it is, if implemented, much more likely to be enforced in democracies than dictatorships, which is very much not an outcome we should want.
The long-term problem with shutting down progress is that at very best, it just exchanges one form of suicide for another. Death is the default. Without progress, we remain trapped in a sealed box, wallowing in our own filth until something more mundane like nuclear war or pandemic, puts an end to our civilization. Once that happens, it's game over. Even if our species survives the immediate disaster, all the easily accessible fossil fuel deposits are gone. There will be no new Renaissance and Industrial Revolution. We'll be back to banging the rocks together until evolution finds a way to get rid of the overhead of general intelligence, then the sun autoclaves what's left of the biosphere and the unobserved stars spend the next ten trillion years burning down to cinders.
(Unless AI alignment is so easy that humans can figure it out by pure armchair thought, in the absence of actually trying to develop AI. But for that to be the case, it would have to be much easier than many technical problems we've already solved. And if AI alignment were that easy, there would be no call for concern in the first place.)
It is, admittedly, not as though there is no ground for pessimism. The problem of AI alignment, as conceived by default, is impossible. That's certainly ground for pessimism!
The default way to think about it is straightforward. Friendliness is a predicate, a quality that an AI has or lacks. A function from a system to a Boolean. (The output could be more complex; it doesn't change the conclusion.) The input is an AI; the output is a Boolean.
The problem – or, let's say, far from the least of the problems – with this formulation is that the function Friendly(AI) is undecidable. Proof: straightforward application of Rice's theorem.
On the face of it, this proves too much; Rice's theorem would seem to preclude writing any useful software. The trick is, of course, that we don't start with an arbitrary program and try to prove it does what we want. We develop the software along with the understanding of why it does what we want and not what we don't want, and preferably along with mechanical verification of some relevant properties like absence of various kinds of stray pointer errors. In other words, the design, implementation and verification all develop together.
This is not news to anyone who has worked in the software industry. The point is that – if and to the extent it exists at all – AI is software, and is subject to the same rules as any other software project: if you want something that reliably does what you want, design, implementation and verification need to go together. Put that way, it sounds obvious, but it's easy to miss the implications.
It means there is no point trying to create a full-blown AGI by running an opaque optimization process – a single really big neural network, say, or a genetic algorithm with a very large population size – on a lot of hardware, and hoping something amazing jumps out. If I'm right about the actual difficulty of AGI, nothing amazing will happen, and if Eliezer et al are right and brute-force AGI is relatively easy, the result won't have the safety properties you want. (That doesn't mean there aren't valuable use cases for running a neural network on a lot of hardware. It does mean 'if only we can throw enough hardware at this, maybe it will wake up and become conscious' is not one of them.)
It means there is no point trying to figure out how to verify an arbitrarily complex, opaque blob after the fact. You can't. Verification has to go in tandem with design and implementation. For example, from https://www.alignmentforum.org/posts/QEYWkRoCn4fZxXQAY/prizes-for-elk-proposals
We suspect you can’t solve ELK just by getting better data—you probably need to “open up the black box” and include some term in the loss that depends on the structure of your model and not merely its behavior.
Yes. Indeed, this is still an understatement; 'open up the black box' is easy to interpret as meaning that you start off by being given a black box, and then begin to think about how to open it up. A better way to look at it is that you need to be thinking about how to figure out what's going on in the box, in tandem with building the box in the first place.
It means there is no point trying to solve the alignment problem by pure armchair thought. That would be like expecting Babbage and Lovelace to deduce Meltdown/Spectre. It's not going to happen in the absence of actually designing and building systems.
It means suppressing development and shutting down progress is suicide. Death and extinction are the default outcomes. Whatever chance we have of turning our future light cone into a place where joy and wonder exist, depends on making continued progress – quickly, before the window of opportunity slams shut.
'Design, implement and verify' sounds more difficult than just trying to do one of these things in isolation, but that's an illusion. All three activities are necessary parts of the job, each depends on the others, and none will be successfully accomplished in isolation.
There are no small pauses in progress. Laws, and the movements that drive them, are not lightbulbs to be turned on and off at the flick of a switch. You can stop progress, but then it stays stopped. The Qeng Ho fleets, for example, once discontinued, did not set sail again twenty years later, or two hundred years later.
There also tend not to be narrow halts in progress. In practice, a serious attempt to shut down progress in AI, is going to shut down progress in computers in general, and they're an important enabling technology for pretty nearly everything else.
If you think any group of people, no matter how smart and dedicated, can solve alignment in twenty years of armchair thought, that means you think the AI alignment problem is, on the scale of things, ridiculously easy.
I'm asking you to stop and think about that for a moment.
AI alignment is ridiculously easy.
Is that really something you actually believe? Do you actually think the evidence points that way?
Or do you just think your proposed way of doing things sounds more comfortable, and the figure of twenty years sounds comfortably far enough in the future that a deadline that far off does not feel pressing, but still sooner that it would be within your lifetime? These are understandable feelings, but unfortunately they don't provide any information about the actual difficulty of the problem.
Modern crops are productive given massive inputs of high-tech industry and energy in the form of things like artificial fertilizers, pesticides, tractors. Deprived of these inputs, we won't be able to feed ourselves, let alone have spare food to burn as fuel.
Actually no, the physics wasn't the gating factor for nuclear energy. One scientist in the 1930s remarked that sure, nuclear fission would work in principle, but to get the enriched uranium, you would have to turn a whole country into an enrichment facility. He wasn't that far wrong; the engineering resources and electrical energy the US put into the Manhattan project, were in the ballpark of what many countries could've mustered in total.
Maybe the Earth is about to be demolished to make room for a hyperspace bypass. Maybe there's a short sequence of Latin words that summons Azathoth, and no way to know this until it's too late because no other sequence of Latin words has any magical effect whatsoever. It's always easy to postulate worlds in which we are dead no matter what we do, but not particularly useful; not only are those worlds unlikely, but by their very nature, planning what to do in those worlds is pointless. All we can usefully do is make plans for those worlds – hopefully a majority – in which there is a way forward.
I am arguing that it will never create an AGI with resources available to human civilization. Biological evolution took four billion years with a whole planet's worth of resources, and that still underestimates the difficulty by an unknown but large factor, because it took many habitable planets to produce intelligence on just one; the lower bound on that factor is given by the absence of any sign of starfaring civilizations in our past light cone; the upper bound could be in millions of orders of magnitude, for all we know.
Well, sure. By the time you've got universal consent to peace on Earth, and the existence of a single vaccine that stops all possible diseases, you've already established that you're living in the utopia section of the Matrix, so you can be pretty relaxed about the long-term future. Unfortunately, that doesn't produce anything much in the way of useful policy guidance for those living in baseline reality.
Sure. Hopefully we all understand that the operative words in that sentence are small and simple.