I feel like I fit into a small group of people who are both:

1) Very confident AI will be developed in the next few decades

2) Not terribly worried about AI alignment

Most people around here seem to fall into either the camp "AI is always 50 years away" or "Oh my gosh Skynet is going to kill us all!".

This is my attempt to explain why

A) I'm less worried than a lot of people around here.

B) I think a lot of the AI alignment work is following a pretty silly research path.

BLUF

Sorry this is so long.

The short version is: I believe we will see a slow takeoff in which AI is developed simultaneously in several places around the globe. This means we need to focus on building institutions not software.


!!!Edit: Important Clarification

I apparently did a poor job writing this, since multiple people have commented "Wow, you sure hate China!" or "Are you saying China taking over the world is worse than a paperclip maximizer!?"

That is not what this post is about!

What this post is about is:

Suppose you were aware that an entity was soon to come into existence which would be much more powerful than you are. Suppose further that you had limited faith in your ability to influence the goals and values of that entity. How would you attempt to engineer the world so that nonetheless after the rise of that entity, your values and existence continue to be protected?

What this post is not about:

An AI perfectly aligned with the interests of the Chinese government would not be worse than a paperclip maximizer (or your preferred bad outcome to the singularity). An AI perfectly aligned with the consistently extrapolated values of the Chinese would probably be pretty okay, since the Chinese are human and share many of the same values as I do. However in a world with a slow takeoff I think it is unlikely any single AI will dominate, much less one that happens to perfectly extrapolate the values of any one group or individual.

Foom, the classic AI-risk scenario

Generally speaking the AI risk crowd tells a very simple story that goes like this:

1) Eventually AI will be developed capable of doing all human-like intellectual tasks

2) Improving AI is one of these tasks

3) Goto 1

The claim is that "Any AI worth its salt will be capable of writing an even better AI, which will be capable of building an even better AI, which will be capable of...." so within (hours? minutes? days?) AI will have gone from human-level to galacticly beyond anything humanity is capable of doing.

I propose an alternative hypothesis:

By the time human-level AI is achieved, most of the low-hanging fruit in the AI improvement domain will have already been found, so subsequent improvements in AI capability will require a superhuman level of intelligence. The first human-level AI will be no more capable of recursive-self-improvement than the first human was.

Note: This does not mean that recursive self-improvement is a thing that is going to stop happening, or that the development of human-level AI will not have profound economic, scientific and philosophical consequences. What this means is, the first AI is going to take some serious time and compute power to out-compete 200 plus years worth of human effort on developing machines that think.

What the first AI looks like in each of these scenarios:

Foom: One day, some hacker in his mom's basement writes an algorithm for a recursively self-improving AI. Ten minutes later, this AI has conquered the world and converted Mars into paperclips

Moof: One day, after a 5 years of arduous effort, Google finally finishes training the first human-level AI. Its intelligence is approximately that of a 5-year-old child. Its first publicly uttered sentence is "Mama, I want to watch Paw Patrol!" A few years later, anybody can "summon" a virtual assistant with human level intelligence from their phone to do their bidding. But people have been using virtual AI assistants on their phone since the mid 2010's, so nobody is nearly as shocked as a time-traveler from the year 2000 would be.

What is the key difference between these scenarios? (Software vs Hardware bound AI)

In the Foom scenario, the key limiting resource or bottleneck was the existence of the correct algorithm. Once this algorithm was found, the AI was able to edit its own source-code, leading to dramatic recursive self-improvement.

In the Moof scenario, the key limiting resources were hardware and "training effort". Building the first AI required massively more compute power and training data than running the first AI, and also massively more than the first AI had access to.

Does this mean that the development of human-level AI might not surprise us? Or that by the time human level AI is developed it will already be old news? I don't know. That depends on whether or not you were surprised by the development of Alpha-Go.

If, on the one hand, you had seen that since the 1950's computer AIs had been capable of beating humans increasingly difficult games and that progress in this domain had been fairly steady and mostly limited by compute power. And moreover that computer Go programs had themselves gone from idiotic to high-amateur level over a course of decades, then the development of alpha-go (if not the exact timing of that development) probably seemed inevitable.

If, on the other hand, you thought that playing Go was a uniquely human skill that required the ability to think creatively which machines could never ever replicate, then Alpha Go probably surprised you.

For the record, I was surprised at how soon Alpha-Go happened, but not that it happened.

What arguments are there in favor of (or against) Hardware Bound AI?

The strongest argument in favor of hardware-bound AI is that in areas of intense human interest, the key "breakthroughs" tend to found by multiple people independently, suggesting they are a result of conditions being correct rather than the existence of a lone genius.

Consider: Writing was independently invented at a minimum in China, South America, and the middle-east. Calculus was developed by both Newton and Leibnez. There are half a dozen people who claim to have beaten the Wright brothers for the first powered flight. Artificial neural networks had been a topic of research for 50 years before the deep-learning revolution.

The strongest argument against Hardware Bound AI (and in favor of Foom) is that we do not currently know the algorithm that will be used to develop a human level intelligence. This leaves open the possibility that a software breakthrough will lead to rapid progress.

However, I argue that not only will the "correct algorithm" be known well in advance of the development of human-level AI, but it will be widely deployed as well. I say this because we have every reason to believe that the algorithm that human-level AI in humans is the same algorithm that produces chimpanzee-level AI in chimps, dog-level AI in dogs and mouse-level AI in mice, if not cockroach-level AI in cockroaches. The evolutionary changes from chimpanzee to human were largely of scale and function, not some revolutionary new brain architecture.

Why should we expect dog-AI or chimp AI to be developed before human-AI? Because they will be useful and because considerable economic gain will go to their developer. Imagine an AI that could be trained as easily as a dog, but whose training could then be instantly copied to millions of "dogs" around the planet.

Furthermore, once dog-AI is developed, billions of dollars of research and investment will be spent improving it to make sure its software and hardware run as efficiently as possible. Consider the massive effort that has gone into the development of software like TensorFlow or Google's TPU's. If there were a "trick" that would make dog-AI even 2x as powerful (or energy efficient), researchers would be eager to find it.

What does this mean for AI alignment? (Or, what is the China Alignment problem?)

Does the belief in hardware-bound AI mean that AI alignment doesn't matter, or that the development of human-level AI will be a relative non-event?

No.

Rather, it means that when thinking about AI risk, we should think of AI less as a single piece of software and more as a coming economic shift that will be widespread and unstoppable well before it officially "happens".

Suppose, living in the USA in the early 1990's, you were aware that there was a nation called China with the potential to be vastly more economically powerful than the USA and whose ideals were vastly different from your own. Suppose, further, that rather than trying to stop the "rise" of China, you believed that developing China's vast economic and intellectual potential could be a great boon for humankind (and for the Chinese themselves).

How would you go about trying to "contain" China's rise? That is, how would you make sure that at whatever moment China's power surpassed your own, you would face a benevolent rather than a hostile opponent.

Well, you would probably do some game theory. If you could convince the Chinese that benevolence was in their own best interest while they were still less-powerful than you, perhaps you would have a chance of influencing their ideology before they became a threat. At the very least your goals would be the following:

1) Non-aggression. You should make it perfectly clear to the Chinese that force will be met with overwhelming force and should they show hostility, they will suffer.

2) Positive-sum games. You should engage China in mutual-economic gain, so that they realize that peaceful intercourse with you is better than the alternative.

3) Global institutions. You should establish a series of global institutions that enshrine the values you hold most dear (human rights, freedom of speech) and make clear that only entities that respect these values (at least on an international stage) will be welcomed to the "club" of developed nations.

Contrast this with traditional AI alignment, which is focused on developing the "right software" so that the first human-level AI will have the same core values as human beings. Not only does this require you to have a perfect description of human values, you must also figure out how to encode those values in a recursively self-improving program, and make sure that your software is the first to achieve Foom. If anyone anywhere develops an AI based off of software that is not perfect before you, we're all doomed.

AI Boxing Strategies for Hardware Bound AI

AI boxing is actually very easy for Hardware Bound AI. You put the AI inside of an air-gapped firewall and make sure it doesn't have enough compute power to invent some novel form of transmission that isn't known to all of science. Since there is a considerable computational gap between useful AI and "all of science", you can do quite a bit with an AI in a box without worrying too much about it going rogue.

Unfortunately, AI boxing is also a bit of a lost cause. It's fine if your AI is nicely contained in a box. However, your competitor in India has been deploying AI on the internet doing financial trading for a decade already. An AI that is programmed to make as much money as possible trading stocks and is allowed to buy more hardware to do so has all of the means, motive, and opportunity to be a threat to humankind.

The only viable strategy is to make sure that you have a pile of hardware of your own that you can use to compete economically before getting swamped by the other guy. The safest path isn't to limit AI risk as much as possible, but rather to make sure that agents you have friendly economic relations with rise as quickly as possible.

What research can I personally invest in to maximize AI safety?

If the biggest threat from AI doesn't come from AI Foom, but rather from Chinese-owned AI with a hostile world-view. And if, like me, you consider the liberal values held by the Western world something worth saving, then the single best course of action you can take is to make sure those liberal values have a place in the coming AI-dominated economy.

This means:

1) Making sure that liberal western democracies continue to stay on the cutting-edge of AI development.

2) Ensuring that global institutions such as the UN and WTO continue to embrace and advance ideals such as free-trade and human-rights.

Keeping the West ahead

Advancing AI research is actually one of the best things you can do to ensure a "peaceful rise" of AI in the future. The sooner we discover the core algorithms behind intelligence, the more time we will have to prepare for the coming revolution. The worst-case scenario still is that some time in the mid 2030's a single research team comes up with a revolutionary new software that puts them miles ahead of anyone else. The more evenly distributed AI research is, the more mutually beneficial economic games will ensure the peaceful rise of AI.

I actually think there is considerable work that can be done right now to develop human-level AI. While I don't think that Moore's law has yet reached the level required to develop human AI, I believe we're approaching "dog-level" and we are undoubtedly well beyond "cockroach level". Serious work on developing sub-human AI not only advances the cause of AI safety, but will also provide enormous economic benefits to all of us living here on earth.

Personally, I think one fruitful area in the next few years will be the combination of deep-learning with "classical AI" to develop models that can make novel inferences and exhibit "one shot" or "few shot" learning. The combination of a classic method (Alpha–beta pruning) and deep learning is what made alpha-go so powerful.

Imagine an AI that was capable of making general inferences about the world, where the inferences themselves were about fuzzy categories extracted through deep learning and self-play. For example it might learn "all birds have wings", where "bird" and "wing" refer to different activations in a deep learning network but the sentence "all birds have wings" is encoded in a expert-system like collection of facts. The system would then progressively expand and curate its set of facts, keeping the ones that were most useful for making predictions about the real world. Such a system could be trained on a youtube-scale video corpus, or on a simulated environment such as Skyrim or Minecraft.

Building institutions

In addition to making sure that AI isn't developed first by an organization hostile to Western liberal values, we also need to make sure that when AI is developed, it is born into a world that encourages its peaceful development. This means promoting norms of liberty, free trade and protection of personal property. In a world with multiple actors trading freely, the optimal strategy is one of trade and cooperation. Violence will only be met with countervailing force.

This means we need to strengthen our institutions as well as our alliances. The more we can enshrine principles of liberty in the basic infrastructure of our society, the more likely they will survive. This means building an internet and financial network that resists surveillance and censorship. Currently blockchain is the best platform I am aware of for this.

This also means developing global norms in which violence is met with collective action against the aggressor. When Russia invades Ukraine or China invades Taiwan, the world cannot simply turn a blind eye. Tit-for-tat like strategies can encourage the evolution of pro-social or at least rational AI entities.

Finally, we need to make sure that Western liberal democracy survives long enough to hand off the reins to AI. This means we need to seriously address problems like secular stagnation, climate change, and economic inequality.

When will human-level AI be developed?

I largely agree with Metaculus that it will happen sometime between 2030 and 2060. I expect that we will see some pretty amazing breakthroughs (dog-level AI) in the next few years. One group whose potential I think is slightly unappreciated is Tesla. They have both a need (self-driving) and the means (video data from millions of cars) to make a huge breakthrough here. Google, Amazon, and whoever is building the surveillance state in China are also obvious places to watch.

One important idea is that of AI fire-alarms. Mine personally was Alpha-Go, which caused me to update from "eventually" to "soon". The next fire-alarm will be an AI that can react to a novel environment with a human-like amount of training data. Imagine an AI that can learn to play Super Mario in only a few hours of gameplay, or an AI that can learn a new card game just by playing with a group of humans for a few hours. When this happens, I will update from "soon" to "very soon".

What are your credences? (How much would you be willing to bet?)

Foom vs Moof:

I think this is a bit of a sucker bet, since if Foom happens we're (probably) all dead. But I would be willing to bet at least 20:1 against Foom. Forms this bet might take are "Will the first human-level AI be trained on hardware costing more or less than $1 million (inflation adjusted)?"

When will AGI happen?

I would be willing to take a bet at 1:1 odds that human-level AI will not happen before 2030.

I will not take a fair bet that human-level AI will happen before 2060, since it's possible that Moore's law will break down in some way I can not predict. I might take such a bet at 1:3 odds.

AI-alignment:

I will take a bet at 10:1 odds that human-level AI will be developed before we have a working example of "aligned AI", that is an AI algorithm that provably incorporates human values in a way that is robust against recursive self-improvement.

Positive outcome to the singularity:

This is even more of a sucker bet than Foom vs Moof. However, my belief is closer to 1:1 than it is to 100:1, since I think there is a real danger that a hostile power such as China develops AI before us, or that we haven't developed sufficiently robust institutions to survive the dramatic economic upheaval that human-level AI will produce.

Tesla vs Google:

I would be willing to bet 5:1 that Tesla will produce a mass-market self-driving car before Google.

New Answer
New Comment

2 Answers sorted by

Slider

30

Free trade can also have a toxic side. It could make sidelining human dignity in terms of economic efficiency the expected default.

Powerful tit for that can also mean that the law of the strongest is normalised. When you stop being the powerful one and the AI feels that you are doing something / immoral dangerous it will take severe action.

The porblem should remain essentially the same if we reframe the China problem as the US problem. I don't want to AI to fail to implement universal healthcare and letting US lead into new ages risks that those values are not upheld. If I don't want there to be a global police state shoud I take swift action when US tries to act as one? One of the problems is that global security mechanism as not exactly multiparty systems but have strong oligarcical features. And when popular nations disregard their function they don't significantly get hampered by them.

Gets politcal and the inference step from values to politics isn't particularly strong.

Free trade can also have a toxic side. It could make sidelining human dignity in terms of economic efficiency the expected default.

Yes!

This means we need to seriously address problems like secular stagnation, climate change, and economic inequality.

The problem should remain essentially the same if we reframe the China problem as the US problem.

Saying there is no difference between the US and China is uncharitable.

Also, I specifically, named it the China problem in reference to this:

Suppose, living in the USA in the early 1990's, you were a
... (read more)
1Slider
If you define the worry as poeple worrying that running healtcare like a private firm will make the system unfunctional rather than person from a specific nationality you will likely get a lot more hits.

Isnasene

10

Nice post! The moof scenario reminds me somewhat of Paul Christiano's slow take-off scenario which you might enjoy reading about. This is basically my stance as well.

AI boxing is actually very easy for Hardware Bound AI. You put the AI inside of an air-gapped firewall and make sure it doesn't have enough compute power to invent some novel form of transmission that isn't known to all of science. Since there is a considerable computational gap between useful AI and "all of science", you can do quite a bit with an AI in a box without worrying too much about it going rogue.

My major concern with AI boxing is the possibility that the AI might just convince people to let it out (ie remove the firewall, provide unbounded internet access, connect it to a Cloud). Maybe you can get around this by combining a limited AI output data stream with a very arduous gated process for letting the AI out in advance but I'm not very confident.

If the biggest threat from AI doesn't come from AI Foom, but rather from Chinese-owned AI with a hostile world-view.

The biggest threat from AI comes from AI-owned AI with a hostile worldview -- no matter whether how the AI gets created. If we can't answer the question "how do we make sure AIs do the things we want them to do when we can't tell them all the things they shouldn't do?", we might wind up with Something Very Smart scheming to take over the world while lacking at least one Important Human Value. Think Age of Em except the Ems aren't even human.

Advancing AI research is actually one of the best things you can do to ensure a "peaceful rise" of AI in the future. The sooner we discover the core algorithms behind intelligence, the more time we will have to prepare for the coming revolution. The worst-case scenario still is that some time in the mid 2030's a single research team comes up with a revolutionary new software that puts them miles ahead of anyone else. The more evenly distributed AI research is, the more mutually beneficial economic games will ensure the peaceful rise of AI.

Because I'm still worried about making sure AI is actually doing the things we want it to do, I'm worried that faster AI advancements will imperil this concern. Beyond that, I'm not really worried about economic dominance in the context of AI. Given a slow takeoff scenario, the economy will be booming like crazy wherever AI has been exercised to its technological capacities even before AGI emerges. In a world of abundant labor and so on, the need for mutually beneficial economic games with other human players, let alone countries, will be much less.

I'm a little worried about military dominance though -- since the country with the best military AI may leverage it to radically gain a geopolitical upper-hand. Still, we were able to handle nuclear weapons so we should probably be able to handle this to.

My major concern with AI boxing is the possibility that the AI might just convince people to let it out

Agree. My point was boxing a human-level AI is in principle easy (especially if that AI exists on a special purpose device of which there is only one in the world), but in practice someone somewhere is going to unbox AI before it is even developed.

The biggest threat from AI comes from AI-owned AI with a hostile worldview -- no matter whether how the AI gets created. If we can't answer the question "how do we make sure AIs do the things we
... (read more)
6Isnasene
This is probably the crux of our disagreement. If an AI is indeed powerful enough to wrest power from humanity, the catastrophic convergence conjecture implies that it by default will. And if the AI is indeed powerful enough to wrest power from humanity, I have difficulty envisioning things we could offer it in trade that it couldn't just unilaterally satisfy for itself in a cheaper and more efficient manner. As an intuition pump for this, I think that the AI-human power differential will be more similar to the human-animal differential than the company-human differential. In the latter case, the company actually relies on humans for continued support (something an AI that can roll-out human-level AI won't need to do at some point) and thus has to maintain a level of trust. In the former case, well... people don't really negotiate with animals at all.

1 Related Questions

1Logan Zoellner
I certainly don't think the Chinese are inherently evil. Rather I think that from the view of an American in the 1990's a world dominated by a totalitarian China which engages in routine genocide and bans freedom of expression would be a "negative outcome to the rise of China". Yes. Exactly. We should be trying to find a Nash equilibrium in which humans are still alive (and ideally relatively free to pursue their values) after the singularity. I suspect such a Nash equilibrium involves multiple AIs competing with strong norms against violence and focus on positive-sum trades. This is precisely what we need to engineer! Unless your claim is that there is no Nash equilibrium in which humanity survives, which seems like a fairly hopeless standpoint to assume. If you are correct, we all die. If you are wrong, we abandon our only hope of survival. Consider deep seabed mining. I would estimate the percent of humans who seriously care (are are aware of the existence of) the sponges living at the bottom of the deep ocean at <1%. Moreover, there are substantial positive economic gains that could potentially be split among multiple nations from mining deep sea nodules. Nonetheless, every attempt to legalize deep sea mining has run unto a hopeless tangle of legal restrictions because most countries view blocking their rivals as more useful than actually mining the deep sea. I would hope that some AIs have an interest in preserving humans for the same reason some humans care about protecting life on the deep seabed, but I don't think this is a necessary condition for ensuring humanity's survival in a post-singularity world. We should be trying to establish a Nash equilibrium in which even insignificant actors have their values and existence preserved. My point is, I'm not sure that aligned AI (in the narrow technical sense of coherently extrapolated values) is even a well-defined term. Nor do I think it is an outcome to the singularity we can easily engineer, since it requ
1Donald Hobson
What I am saying is that if you roll a bunch of random superintelligences, superintelligences that don't care in the slightest about humanity in their utility function, then selection of a Nash equilibria is enough to get a nice future. It certainly isn't enough if humans are doing the selection and we don't know what the AI's want or what technologies they will have. Will one superintelligence be sufficiently transparent to another superintelligence that they will be able to provide logical proofs of their future behaviour to each other? Where does the armsrace of stealth and detection end up? What about If at least some of the AI's have been deliberately designed to care about us, then we might get a nice future. From the article you link to On the other hand, people do drill for oil in the ocean. It sounds to me like deep seabed mining is unprofitable or not that profitable, given current tech and metal prices. If you have a tribe of humans, and the tribe has norm then everyone is expected to be able to understand the norms. The norms have to be fairly straightforward to humans. Don't do X except for [100 subtle special cases] gets simplified to don't do X. This happens even when everyone would be better off with the special cases. When you have big corporations with legal teams, the agreements can be more complicated. When you have super-intelligences, the agreements can be Far more complicated. Humans and human organisations are reluctant to agree to a complicated deal that only benefits them slightly, from the overhead cost of reading and thinking about the deal. Whatsmore, the Nash equilibria that humanity has been in has changed with technology and society. If a Nash equilibria is all that protects humanity, if an AI comes up with a way to kill off all humans and distribute their reasources equally, without any AI being able to figure out who killed the humans, then the AI will kill all humans. Nash equilibria are fragile to details of situation and tec
21 comments, sorted by Click to highlight new comments since:

This was an interesting experience to read, because it started by saying (roughly) "AI alignment is useless" (and I work on AI alignment), yet I found myself agreeing with nearly every point, at least qualitatively if not quantitatively. (Less so for the "keeping the West ahead" section, where I'm more uncertain because I just don't know much about geopolitics.)

I'd recommend reading through AN #80, which summarizes the views of four people (including three AI alignment researchers) who are optimistic about solving AI alignment. I think they're pretty similar to the things you say here. I'd also recommend reading What failure looks like for an abstract description of the sorts of problems I am trying to avoid via AI alignment research. I'm also happy to answer questions like "what are you doing if not encoding human values", "if you agree with these arguments why do you worry about AI x-risk", etc.

Given that you emphasize hardware-bound agents: have you seen AI and Compute? A reasonably large fraction of the AI alignment community takes it quite seriously.

I'll do my own version of your credences, though I should note I'm not actually offering to bet (I find it not worth the hassle). It's more like "if the bets were automatically tracked and paid out, and also I ignore 'sucker bet' effects, what bets would I be willing to make?"

Foom vs Moof:

Conditioned on AGI by 2050, I'd bet 20:1 against foom the way you seem to be imagining it. Unconditionally, I'd bet 10:1 against foom. Maybe I'd accept a 1:1000 bet for foom?

When will AGI happen?

I'd bet at 1.5:1 odds that it won't happen before 2030, and at 1:2 odds that it will happen before 2060.

AI-alignment:

The way you define it, I'd take a bet at 100:1 odds against having "aligned AI". (Though I'd want to clarify that your definition means what I think it means first.)

Positive outcome to the singularity:

If "China develops AI" is sufficient to count as a non-positive outcome, then I'm not really sure what this phrase means, so I'll abstain here.

Tesla vs Google:

This is one place where I disagree with you; I'd guess that Google has better technology (both now and in the future), though I could still see Tesla producing a mass-market self-driving car before Google because 1. it's closer to their core business (Google plausibly produces a rideshare service instead) and 2. Google cares more about their brand. Still, I think I'd take your 5:1 bet.

Given that you emphasize hardware-bound agents: have you seen AI and Compute? A reasonably large fraction of the AI alignment community takes it quite seriously.

This trend is going to run into Moore's law as an upper ceiling very soon (within a year, the line will require a year of the world's most powerful computer). What do you predict will happen then?


"what are you doing if not encoding human values"

Interested in the answer to this, and how much it looks like/disagrees with my proposal: building free trade, respect for individual autonomy, and censorship resistance into the core infrastructure and social institutions our world runs on.

What do you predict will happen then?

I don't know; I do expect the line to slow down, though I'm not sure when. (See e.g. here and here for other people's analysis of this point.)

Interested in the answer to this, and how much it looks like/disagrees with my proposal

It's of a different type signature than your proposal. I agree that "how should infrastructure and institutions be changed" is an important question; it's just not what I focus on. I think that there is still a technical question that needs to be answered: how do you build AI systems that do what you want them to do?

In particular, nearly all AI algorithms that have ever been developed assume a known goal / specification, and then figure out how to achieve that goal. If this were to continue all the way till superintelligent AI systems, I'd be very worried, because of convergent instrumental subgoals. I don't think this will continue all the way to superintelligent AI systems, but that's because I expect people (including myself) to figure out how to build AI systems in a different way so that they optimize for our goals instead of their own goals.

Of course one way to do this would be to encode a perfect representation of human values into the system, but like you I think this is unlikely to work (see also Chapter 1 of the Value Learning sequence). I usually think of the goal as "figure out how to build an AI system that is trying to help us", where part of helpful behavior is clarifying our preferences / values with us, ensuring that we have accurate information, etc. (See Clarifying AI Alignment and my comment on it.) Think of this as like trying to figure out how to embed the skills of a great personal assistant into an AI system.

The strongest argument in favor of hardware-bound AI is that in areas of intense human interest, the key "breakthroughs" tend to found by multiple people independently, suggesting they are a result of conditions being correct rather than the existence of a lone genius.

If you expend n units of genius time against a problem and then find a solution. If a bunch more geniuses spend another n units on the problem, they are likely to find a solution again. If poor communications stop an invention being spread quickly, then a substantial amount of thought is spent trying to solve a problem after someone has already solved it, the problem is likely to be solved twice.

I don't see why those "conditions" can't be conceptual background. Suppose I went back in time, and gave a bunch of ancient greeks loads of flop computers. Several greeks invents the concept of probability. Another uses that concept to invent the concept of expected utility maximisation. Solemonov induction is invented by a team a few years later. When they finally make AI, much of the conceptual work was done by multiple people independantly, and no one person did more than a small part. The model is a long list of ideas, and you cant invent idea unless you know idea .

What this means is, the first AI is going to take some serious time and compute power to out-compete 200 plus years worth of human effort on developing machines that think.

The first AI is in a very different position from the first humans. It took many humans many years before the concept of a logic gate was developed. The humans didn't know that logic gates were a thing, and most of them weren't trying in the right direction. The position of the AI is closer to the position of a kid that can access the internet and read all about maths and comp sci, and then the latest papers on AI and its own source code.


By the time human-level AI is achieved, most of the low-hanging fruit in the AI improvement domain will have already been found, so subsequent improvements in AI capability will require a superhuman level of intelligence. The first human-level AI will be no more capable of recursive-self-improvement than the first human was.

This requires two thresholds to line up closely. For the special case of playing chess, we didn't find that by the time we got to machines that played chess at a human level, any further improvements in chess algorithms took superhuman intelligence.

What the first AI looks like in each of these scenarios:
Foom: One day, some hacker in his mom's basement writes an algorithm for a recursively self-improving AI. Ten minutes later, this AI has conquered the world and converted Mars into paperclips
Moof: One day, after a 5 years of arduous effort, Google finally finishes training the first human-level AI. Its intelligence is approximately that of a 5-year-old child. Its first publicly uttered sentence is "Mama, I want to watch Paw Patrol!" A few years later, anybody can "summon" a virtual assistant with human level intelligence from their phone to do their bidding. But people have been using virtual AI assistants on their phone since the mid 2010's, so nobody is nearly as shocked as a time-traveler from the year 2000 would be.

I have no strong opinion on whether the first AI will be produced by google or some hacker in a basement.

In the Moof scenario, I think this could happen. Here is the sort of thing I expect afterwords.

6 months later, google have improved their algorithms. The system now has an IQ of 103 and is being used for simple and repetitive programming tasks.

2 weeks later. A few parameter tweeks broght it up to IQ 140. It modified its own algorithms to take better use of processor cache, bringing its speed from 500x human to 1000x human. It is making several publishable new results in AI research a day.

1 week later, the AI has been gaming the stock market and rewriting its own algorithms further, hiring cloud compute, selling computer programs and digital services, it has also started some biotechnogy experiments ect.

1 week later, the AI has bootstraped self replicating nanobots, it is now constructing hardware that is 10,000x faster than current computer chips.

It is when you get to an AI that is smarter than the researchers, and orders of magnitude faster that recursive self improvement takes off.

It modified its own algorithms to take better use of processor cache, bringing its speed from 500x human to 1000x human. It is making several publishable new results in AI research a day.

I think we disagree on what Moof looks like. I envision the first human-level AI as also running at human-like speeds on a $10 million+ platform and then accelerating according to Moore's law. This still results in pretty dramatic upheaval but over the course of years, not weeks. I also expect humans will be using some pretty powerful sub-human AIs, so it's not like the AI gets a free boost just for being in software.

Again, the reason why is I think the algorithms will be known well in advance and it will be a race between most of the major players to build hardware fast enough to emulate human-level intelligence. The more the first human-level AI results from a software innovation rather than a Manhatten-project style hardware effort, the more likely we will see Foom. If the first human-level AI runs on commodity hardware, or runs 500x faster than any human, we have already seen Foom.

If we assume mores law of doubling every 18 months, and that the AI training to runtime ratio is similar to humans then the total compute you can get from always having run a program on a machine of price X is about equal to 2 years of compute on a current machine of price X. Another way of phrasing this is that if you want as much compute as possible done by some date, and you have a fixed budget, you should by your computer 2 years before the date. (If you bought it 50 years before, it would be an obsolete pile of junk, if you bought it 5 minutes before, it would only have 5 minutes to compute)

Therefore, in a hardware limited situation, your AI will have been training for about 2 years. So if your AI takes 20 subjective years to train, it is running at 10x human speed. If the AI development process involved trying 100 variations and then picking the one that works best, then your AI can run at 1000x human speed.

I think the scenario you describe is somewhat plausible, but not the most likely option because I don't think we will be hardware limited. At the moment, current supercomputers seem to have around enough compute to simulate every synapse in a human brain with floating point arithmetic, in real time. (Based on synapses at 100 Hz, flops) I doubt using accurate serial floating point operations to simulate noisy analogue neurons, as arranged by evolution is anywhere near optimal. I also think that we don't know enough about the software. We don't currently have anything like an algorithm just waiting for hardware. Still if some unexpectedly fast algoritmic progress happened in the next few years, we could get a moof. Or if algorithmic progress moved in a particular direction later.

I really like this this response! We are thinking about some of the same math.

Some minor quibbles, and again I think "years" not "weeks" is an appropriate time-frame for "first Human AI -> AI surpasses all humans"

Therefore, in a hardware limited situation, your AI will have been training for about 2 years. So if your AI takes 20 subjective years to train, it is running at 10x human speed. If the AI development process involved trying 100 variations and then picking the one that works best, then your AI can run at 1000x human speed.

A three-year-old child does not take 20 subjective years to train. Even a 20-year-old adult human does not take 20 subjective years to train. We spend an awful lot of time sleeping, watching TV, etc. I doubt literally every second of that is mandatory for reaching the intelligence of an average adult human being.

At the moment, current supercomputers seem to have around enough compute to simulate every synapse in a human brain with floating point arithmetic, in real time. (Based on 1014 synapses at 100 Hz, 1017 flops) I doubt using accurate serial floating point operations to simulate noisy analogue neurons, as arranged by evolution is anywhere near optimal.

I think just the opposite. A synapse is not a FLOP. My estimate is closer to 10^19. Moreover most of the top slots in the TOP500 list are vanity projects by governments or used for stuff like simulating nuclear explosions.

Although, to be fair, once this curve collides with Moore's law, that 2nd objection will no longer be true.

I don't think that a Moof scenario implies that a diplomatic "China Alignment problem" approach will work.

Imagine the hypothetical world where Dr evil publishes the code for an evil AI on the internet. The code, if run, would create an AI whose only goal is to destroy humanity. At first, only a few big companies have enough compute to run this thing, and they have the sense to only run it in a sandbox, or not at all. Over years, the compute requirement falls. Sooner or later some idiot will let the evil AI loose on the world. As compute gets cheaper, the AI gets more dangerous. Making sure you have a big computer first is useless in this scenario.

1) Making sure that liberal western democracies continue to stay on the cutting-edge of AI development.

Is only useful to the extent that an AI made by a liberal western democracy looks any different to an AI made by anyone else.

China differs from AI in that to the extent that human values are genetically hard coded, the chinese have the same values as us. To the extent that human values are culturally transmitted, we can culturally transmit our values. AI's might have totally different hard coded values that no amount of diplomacy can change.

A lot of the approaches to the "China alignment problem" rely on modifying the game theoretic position, given a fixed utility function. Ie having weapons and threatening to use them. This only works against an opponent to which your weapons pose a real threat. If, 20 years after the start of Moof, the AI's can defend against all human weapons with ease, and can make any material goods using less raw materials and energy than the humans use, then the AI's lack a strong reason to keep us around. (This is roughly why diplomacy didn't work for the native Americans, the Europeans wanted the land far more than they wanted any goods that the native Americans could make, and didn't fear the native Americans weapons. )

A lot of the approaches to the "China alignment problem" rely on modifying the game theoretic position, given a fixed utility function. Ie having weapons and threatening to use them. This only works against an opponent to which your weapons pose a real threat. If, 20 years after the start of Moof, the AI's can defend against all human weapons with ease, and can make any material goods using less raw materials and energy than the humans use, then the AI's lack a strong reason to keep us around.

If the AIs are a monolithic entity whose values are universally opposed to those of humans then, yes, we are doomed. But I don't think this has to be the case. If the post-singularity world consists of an ecosystem of AIs whose mutually competing interests causes them to balance one-another and engage in positive sum games then humanity is preserved not because the AI fears us, but because that is the "norm of behavior" for agents in their society.

Yes, it is scary to imagine a future where humans are no longer at the helm, but I think it is possible to build a future where our values are tolerated and allowed to continue to exist.

By contrast, I am not optimistic about attempts to "extrapolate" human values to an AI capable of acts like turning the entire world into paperclips. Humans are greedy, superstitious and naive. Hopefully our AI descendants will be our better angels and build a world better than any that we can imagine.

If the post-singularity world consists of an ecosystem of AIs whose mutually competing interests causes them to balance one-another and engage in positive sum games then humanity is preserved not because the AI fears us, but because that is the "norm of behavior" for agents in their society.

So many different AI's with many different goals, all easily capable of destroying humanity, none that intrinsicly wants to protect humanity.Yet none decides that destroying humanity is a good idea.

Human values are large and arbitrary. The only agent optimising them is humans, and

By contrast, I am not optimistic about attempts to "extrapolate" human values to an AI capable of acts like turning the entire world into paperclips. Humans are greedy, superstitious and naive. Hopefully our AI descendants will be our better angels and build a world better than any that we can imagine.

Suppose you want to make a mechanical clock. You have tried to build one in a metalwork workshop and not got anything to work yet. So you decide to go to the scrap pile and start throwing rocks at it, in the hope that you can make a clock that way. Now maybe it is possible to make a crude clock, at least nudge a beam into a position where it can swing back and forth, by throwing a lot of rocks at just the right angles. You are still being stupid, because you are ignoring effective tools and making the problem needlessly harder for yourself.

I feel that you are doing the same in AI design. Free reign over the space of utility functions, any piece of computer code you care to write is a powerful and general capability. Trying to find Nash equilibria is throwing rocks at a junkyard. Trying to find Nash equilibria without knowing how many AI's there are or how those AI's are designed is thowing rocks in the junkyard while blindfolded.

Suppose the AI has developed the tech to upload a human mind into a virtual paradise, and is deciding whether to do it or not. In an aligned AI, you get to write arbitrary code to describe the procedure to a human, and interpret the humans answer. Maybe the human doesn't have a concept of mind uploading, and the AI is deciding whether to go for "mechanical dream realm" or "artificial heaven" or "like replacing a leg with a wooden one, except the wooden leg is better than your old one, and for all of you not just a leg". Of course, the raw data of its abstract reasoning files is Gb of gibberish, and making it output anything more usable is non trivial. Maybe the human's answer depends on how you ask the question. Maybe the human answers "Um maybe, I don't know". Maybe the AI spots a flaw in the humans reasoning, does it point it out? The problem of asking a human a question is highly non trivial.

In the general aligned AI paradigm, if you have a formal answer to this problem, you can just type it up and that's your code.

In your Nash equilibria, once you have a formal answer, you still have to design a nash equilibria that makes AI's care about that formal answer, and then ensure that real world AI's fall into that Nash equilibria.

If you hope to get a Nash equilibria that asks humans questions and listens to the answers without a formal description of exactly what you mean by "asks humans questions and listens to the answers", then could you explain what property singles this behaviour out as a Nash equilibria. From the point of view of abstract maths, there is no obvious way to distinguish a function that converts the AI's abstract world models into english, from one that translates it into japanese, klingon, or any of trillions of varieties of gibberish. And no the AI doesn't just "Know english".

Suppose you start listening to chinese radio. After a while you notice patterns, you get quite good at predicting which meaningless sounds follow which other meaningless sounds. You then go to china. You start repeating strings of meaningless sounds at Chinese people. They respond back with equally meaningless strings of sounds. Over time you get quite good at predicting what the response will be. If you say "Ho yaa" they will usually respond "du sin", but the old men sometimes respond "du son". Sometimes the chinese people start jumping up and down or pointing to you. You know a pattern of sounds that will usually cause chinese people to jump up and down, but you have no idea why. Are you giving them good news and their jumping for joy? Are you insulting them and they are hopping mad? Is it some strange chinese custom to jump when they hear a particular word? Are you asking them to jump? ordering them to jump? Telling them that jumping is an exceptionally healthy exercise? Initiating a jumping contest? You have no idea. Maybe you find a string of sounds that makes chinese people give you food, but have no idea if you are telling a sob story, making threats, or offering to pay and then running off.

Now replace the chinese people with space aliens. You don't even know if they have an emotion of angry. You don't know if they have emotions at all. You are still quite good at predicting how they will behave. This is the position that an AI is in.

You are still being stupid, because you are ignoring effective tools and making the problem needlessly harder for yourself.

I think this is precisely where we disagree. I believe that we do not have effective tools for writing utility functions and we do have effective tools for designing at least one Nash Equilibrium that preserves human value, namely:

1) All entities have the right to hold and express their own values freely

2) All entities have the right to engage in positive-sum trades with other entities

3) Violence is anathema.

Some more about why I think humans are bad at writing utility functions:

I am the extremely skeptical about anything of the form: We will define a utility function that encodes human values. Machine learning is really good at misinterpreting utility functions written by humans. I think this problem will only get worse with a super-intelligence AI.

I am more optimistic about goals of the form "Learn to ask what humans want". But I still think these will fail eventually. There are lots of questions even ardent utilitarians would have difficulty answering. For example, "Torture 1 person or give 3^^^3 people a slight headache?".

I'm not saying all efforts to design friendly AIs are pointless, or that we should willingly release paperclip maximizes on the world. Rather, I believe we boost our chances of preserving human existence and values by encouraging a multi-polar world with lots of competing (but non-violent) AIs. The competing plan of "don't create AI until we have designed the perfect utility function and hope that our AI is the dominant one" seems like it has a much higher risk of failure, especially in a world where other people will also be developing AI.

Importantly, we have the technology to deploy "build a world where people are mostly free and non-violent" today, and I don't think we have the technology to "design a utility function that is robust against misinterpretation by a recursively improving AI".


One additional aside

Suppose the AI has developed the tech to upload a human mind into a virtual paradise, and is deciding whether to do it or not.

I must confess the goals of this post are more modest than this. The Nash equilibrium I described is one that preserves human existence and values as they are it does nothing in the domain of creating a virtual paradise where humans will enjoy infinite pleasure (and in fact actively avoids forcing this on people).

I suspect some people will try to build AIs that grant them infinite pleasure, and I do not grudge them this (so long as they do so in a way that respects the rights of others to choose freely). Humans will fall into many camps. Those who just want to be left alone, those who wish to pursue knowledge, those who wish to enjoy paradise. I want to build a world where all of those groups can co-exist without wiping out one-another or being wiped out by a malevolent AI.

1) All entities have the right to hold and express their own values freely
2) All entities have the right to engage in positive-sum trades with other entities
3) Violence is anathema.

The problem is that these sound simple, they are easily expressed in english, but they are pointers to your moral decisions. For example, which lifeforms count as "entities"? If the AI's decide that every bacteria is an entity that can hold and express its values freely then the result will probably look very weird, and might involve humans being ripped apart to rescue the bacteria inside them. Unborn babies? Brain damaged people? The word entities is a reference to your own concept of a morally valuable being. You have within your own head, a magic black box that can take in descriptions of various things, and decide whether or not they are "entities with the right to hold and express values freely".

You have a lot of information within your own head about what counts as an entity, what counts as violence ect, that you want to transfer to the AI.

All entities have the right to engage in positive-sum trades with other entities

This is especially problematic. The whole reason that any of this is difficult is because humans are not perfect game theoretic agents. Game theoretic agents have a fully specified utility function, and maximise it perfectly. There is no clear line between offering a human something they want, and persuading a human to want something with manipulative marketing. In some limited situations, humans can kind of be approximated as game theoretic agents. However, this approximation breaks down in a lot of circumstances.

I think that there might be a lot of possible Nash equilibria. Any set rules that say to enforce all the rules including this one could be a Nash equilibria. I see a vast space of ways to treat humans. Most of that space contains ways humans wouldn't like. There could be just one Nash equilibria, or the whole space could be full of Nash equilibria. So either their isn't a nice Nash equilibria, or we have to pick the nice equilibria from amongst gazillions of nasty ones. In much the same way, if you start picking random letters, either you won't get a sentence, or if you pick enough you will get a sentence buried in piles of gibberish.

Importantly, we have the technology to deploy "build a world where people are mostly free and non-violent" today, and I don't think we have the technology to "design a utility function that is robust against misinterpretation by a recursively improving AI".

The mostly free and nonvionlent kindof state of affairs is a Nash equilibria in the current world. It is only a Nash equilibria based on a lot of contingent facts about human psycology, culture and socioeconomic situation. Many other human cultures, most historical, have embraced slavery, pillaging and all sorts of other stuff. Humans have a sense of empathy, and all else being equal, would prefer to be nice to other humans. Humans have an inbuilt anger mechanism that automatically retaliates against others, whether or not it benefits themselves. Humans have strongly bounded personal utillities. The current economic situation makes the gains from cooperating relatively large.

So in short, Nash equilibria amongst super-intelligences are very different from Nash equilibria amongst humans. Picking which equilibria a bunch of superintelligences end up in is hard. Humans being nice around the developing AI will not cause the AI's to magically fall into a nice equilibria, any more than humans being full of blood around the AI's will cause the AI's to fall into a Nash equilibria that involves pouring blood on their circuit boards.

There probably is a Nash equilibria that has AI's pouring blood on their circuit boards, and all the AI's promise to attack any AI that doesn't, but you aren't going to get that equilibrium just by walking around full of blood. You aren't going to get it even if you happen to cut yourself on a circuit board or deliberately pour blood all over them.

[-]dxu20

If, on the one hand, you had seen that since the 1950's computer AIs had been capable of beating humans increasingly difficult games and that progress in this domain had been fairly steady and mostly limited by compute power. And moreover that computer Go programs had themselves gone from idiotic to high-amateur level over a course of decades, then the development of alpha-go (if not the exact timing of that development) probably seemed inevitable.

This seems to entirely ignore most (if not all) of the salient implications of AlphaGo's development. What set AlphaGo apart from previous attempts at computer Go was the iterated distillation and amplification scheme employed during its training scheme. This represents a genuine conceptual advance over previous approaches, and to characterize it as simply a continuation of the trend of increasing strength in Go-playing programs only works if you neglect to define said "trend" in any way more specific than "roughly monotonically increasing". And if you do that, you've tossed out any and all information that would make this a useful and non-vacuous observation.

Shortly after this paragraph, you write:

For the record, I was surprised at how soon Alpha-Go happened, but not that it happened.

In other words, you got the easy and useless part ("will it happen?") right, and the difficult and important part ("when will it happen?") wrong. It's not clear to me why you feel this necessitated mention at all, but since you did mention it, I feel obligated to point out that "predictions" of this caliber are the best you'll ever be able to do if you insist on throwing out any information more specific and granular than "historically, these metrics seem to move consistently upward/downward".

In other words, you got the easy and useless part ("will it happen?") right, and the difficult and important part ("when will it happen?") wrong.

"Will it happen?" isn't vacuous or easy, generally speaking. I can think of lots of questions where I have no idea what the answer is, despite a "trend of ever increasing strength". For example:

Will Chess be solved?

Will faster than light travel be solved?

Will P=NP be solved?

Will the hard problem of consciousness be solved?

Will a Dyson sphere be constructed around Sol?

Will anthropogenic climate change cause Earth's temperature to rise by 4C?

Will Earth's population surpass 100 billion people?

Will the African Rhinoceros go extinct?

I feel obligated to point out that "predictions" of this caliber are the best you'll ever be able to do if you insist on throwing out any information more specific and granular than "historically, these metrics seem to move consistently upward/downward".

I've made specific statements about my beliefs for when Human-Level AI will be developed. If you disagree with these predictions, please state your own.

[-]dxu20

"Will it happen?" isn't vacuous or easy, generally speaking. I can think of lots of questions where I have no idea what the answer is, despite a "trend of ever increasing strength".

In the post, you write:

If, on the one hand, you had seen that since the 1950's computer AIs had been capable of beating humans increasingly difficult games and that progress in this domain had been fairly steady and mostly limited by compute power. And moreover that computer Go programs had themselves gone from idiotic to high-amateur level over a course of decades, then the development of alpha-go (if not the exact timing of that development) probably seemed inevitable.

"Will it happen?" is easy precisely in cases where a development "seems inevitable"; the hard part then becomes forecasting when such a development will occur. The fact that you (and most computer Go experts, in fact) did not do this is a testament to how unpredictable conceptual advances are, and your attempt to reduce it to the mere continuation of a trend is an oversimplification of the highest order.

I've made specific statements about my beliefs for when Human-Level AI will be developed. If you disagree with these predictions, please state your own.

You've made statements about your willingness to bet at non-extreme odds over relatively large chunks of time. This indicates both low confidence and low granularity, which means that there's very little disagreement to be had. (Of course, I don't mean to imply that it's possible to do better; indeed, given the current level of uncertainty surrounding everything to do with AI, about the only way to get me to disagree with you would have been to provide a highly confident, specific prediction.)

Nevertheless, it's an indicator that you do not believe you possess particularly reliable information about future advances in AI, so I remain puzzled that you would present your thesis so strongly at the start. In particular, your claim that the following questions

Does this mean that the development of human-level AI might not surprise us? Or that by the time human level AI is developed it will already be old news?

depend on

whether or not you were surprised by the development of Alpha-Go

seems to have literally no connection to what you later claim, which is that AlphaGo did not surprise you because you knew something like it had to happen at some point. What is the relevant analogy here to artificial general intelligence? Will artificial general intelligence be "old news" because we suspected from the start that it was possible? If so, what does it mean for something be "old news" if you have no idea when it will happen, and could not have predicted it would happen at any particular point until after it showed up?

As far as I can tell, reading through both the initial post and the comments, none of these questions have been answered.

Just wondering. Why are some so often convinced that the victory of China in the AGI race will lead to the end of humanity? The Chinese strategy seems to me much more focused on long terms.
The most prominent experts give a 50% chance of AI in 2099 (https://spectrum.ieee.org/automaton/robotics/artificial-intelligence/book-review-architects-of-intelligence). And I can expect that the world in 80 years will be significantly different from the present. Well, you can call this a totalitarian hell, but I think that the probability of an existential disaster in this world will become less.

Why are some so often convinced that the victory of China in the AGI race will lead to the end of humanity?

I don't think a Chinese world order will result in the end of humanity, but I do think it will make stuff like this much more common. I am interested in creating a future I would actually want to live in.

The most prominent experts give a 50% chance of AI in 2099

How much would you be willing to bet that AI will not exist in 2060, and at what odds?

but I think that the probability of an existential disaster in this world will become less.

Are you arguing that a victory for Chinese totalitarianism makes Human extinction less likely than a liberal world order?

I’m not sure that I can trust news sources that are interested in outlining China.
In any case, this does not seem to stop the Chinese people from feeling happier than the US people.
I cited this date just to contrast with your forecast. My intuition is more likely to point to AI in the 2050-2060 years.
And yes, I expect that in 2050 it will be possible to monitor the behavior of each person in countries 24/7. I can’t say that it makes me happy, but I think that the vast majority will put up with this. I don't believe in a liberal democratic utopia, but the end of the world seems unlikely to me.

In any case, this does not seem to stop the Chinese people from feeling happier than the US people.

Lots of happy people in China.

And yes, I expect that in 2050 it will be possible to monitor the behavior of each person in countries 24/7. I can’t say that it makes me happy, but I think that the vast majority will put up with this. I don't believe in a liberal democratic utopia, but the end of the world seems unlikely to me.

Call me a crazy optimist, but I think we can aim higher than: Yes, you will be monitored 24/7, but at least humanity won't literally go extinct.

I meant the results of such polls: https://www.thatsmags.com/china/post/15129/happy-planet-index-china-is-72nd-happiest-country-in-the-world. Well, it doesn’t matter.
I think that I could sleep better if everyone recognized the reduction of existential risks in a less free world.