Review


In the past year, I have started talking about AI risk publicly - in mainstream newspapers, public radio, national TV, some of the most popular podcasts. The twist, and reason why you probably haven't noticed is I'm doing this in Czech. This has a  large disadvantage - the positive impact is quite limited, compared to English. On the other hand, it also had a big advantage - the risk is very low, because it is very hard for memes and misunderstandings to escape the language bubble. Overall I think this is great for experiments with open public communication . 

Following is an off-the-cuff list of notes and suggestions. In my view the debate in Czech media and Czech social networks is on average marginally more sensible and more informed than in English so far, so perhaps part of this was successful and could be useful for others.

Context: my views

For context, it's probably good to briefly mention some of my overall views on AI risk, because they are notably different from some other views.

I do expect

1. Continuous takeoff, and overall a large amount of continuity of agency (note that continuous does not imply things move slowly)

2. Optimization and cognition distributed across many systems to be more powerful than any single system, making a takeover by a single system possible but unlikely

3. Multiagent interactions to matter

4. I also do expect the interactions between the memetics and governance and the so-called "technical problem" to be strong and important


As a result I also expect

5. There will be warning shots

6. There will be cyborg periods

7. World will get weird

8. Coordination mechanisms do matter

This perspective may be easier to communicate than e.g. sudden foom - although I don't know.

In the following I'll usually describe my approach, and illustrate it by actual quotes from published media interviews I'm sort of happy about (translated, unedited). Note that the specific ways how to say something or metaphors are rarely original.

Aim to explain, not to persuade

Overall I usually try to explain stuff and answer questions, rather than advocate for something. I'm optimistic about the ability of the relevant part of the public to actually understand a large part of the risk at a coarse-grained level, given enough attention.

So even though we invented these machines ourselves, we don't understand them well enough?

We know what's going on there at the micro level. We know how the systems learn. If one number in a series changes, we know how the next one changes. But there are tens of billions of such numbers. In the same way, we have some idea of how a neuron works, and we have maps of a network of thousands of neurons. But that doesn't tell us that much about how human thinking works at the level of ideas.

Small versions of scaled problems

Often, I think the most useful thing to convey is a scaled-down, easier version of the scaled problem, such that thinking about the smaller version leads to correct intuitions about the scaled problem, or solutions to the scaled-down problem may generalise to the later, scaled problem.

This often requires some thought or finding a good metaphor.

Couldn't we just shut down such a system?

We already have a lot of systems that we could hypothetically shut down, but if you actually tried to do so, it would be very difficult.  For example, it is practically impossible to shut down the New York Stock Exchange, because there will always be enough people defending it. If the model manages to penetrate deeply enough into humanity's activities, it is imaginable that humanity will actually lose control of it at some point. 


Don't focus on one scenario

Overall, I think it's possible to explain the fact that in face of AI risk there isn't one particular story we can identify and prevent, but the problem is that unaligned powerful systems can find many ways to threaten you.

How can we imagine the threat associated with the development of artificial intelligence? I expect not as shooting robots.

Not shooting robots, of course. I'd put it another way. One of the reasons humans have become dominant on the planet is their intelligence. Not only that, but it is also our ability to work together in large groups and share ideas.

However, if you look at the evolution of mankind from the perspective of a chimpanzee or a mammoth, at one point something started happening for them that they no longer understood. Today, people can tame the chimpanzee or kill it in a staggering number of ways that they no longer understand. The chimp doesn't understand what happens when the tranquilliser injection hits.

And we could be in a similar position the moment we lose control of artificial intelligence.



I'm rather reluctant to describe one particular scenario. If a system is much more intelligent than I am, it can naturally develop ways to limit or threaten me that I can't even imagine. 

Some of the people who study the risks of AI think that someone is going to plug in a large system, it will somehow escape, and we will lose control of it very quickly. It could even happen that we're all dead at once and we don't know why. 

Personally, I think it's more likely that there will be some kind of continuous loss of control. Nothing dramatic will happen, but more likely we'll get to a state where we don't understand what's going on with the world, and we won't be able to influence it. Or we'll get a sense of some unrealistic picture of the world in which we're happy and we won't complain, but we won't decide anything.

But if I describe a very specific scenario to you, you would argue that something like that can be prevented in advance or easily solved. But that's the problem I was describing a moment ago. We'd just be talking like two chimpanzees who also can't even imagine most of the ways humans can threaten them.


The doom memeplex

Overall, I'm not excited about summoning the doom memeplex into public consciousness. (In a large contrast to Eliezer Yudkowsky, who seems to be investing heavily into this summon). Why?

Mostly, I don't trust that the doom memeplex is all that helpful in solving the alignment problem. I broadly agree with Valentine that being in a state of mental pain and terror isn't a great state to see the problem clearly enough.

Also similarly to AIs, memeplexes are much easier to summon than control. 

Also similarly to some AIs, memeplexes have convergent instrumental goals and self-interests. Some worrisome implications are, among others:

- it's not in the self-interest of the doom memeplex to recognize alignment solutions

- it is in the self-interest of he doom memeplex to reward high p(doom) beliefs

- the AGI doom memeplex has, to some extent, a symbiotic relationship with the race toward AGI memeplex

One specific implication is I find it much more productive and useful to focus on the fact that in AI risk scenarios we lose control over the future, rather than "your kids will die".

A colleague of mine describes it with the metaphor that we can find ourselves in the role of a herd of cows, whose fate is being determined by the farmer. And I don't think we want to get to that stage. While there are many interesting questions about what might happen to such a herd afterwards, I think they are distracting. We need to focus on not losing control of the AI 


Relevant maps

Large part of the difference between how the general public understands AI and ML specialists understand AI is in what maps people rely on. The public often uses a map "like a human, but running on a computer" and has easy access to maps like "like a Google size corporation, but automated". In contrast, many ML practitioners rely on maps like "my everyday experience with training ML models, which are small and sort of dumb".

Good understanding of AI risk usually requires thinking about multiple maps, but basic understanding of the risk can be actually based on the maps accessible to the public.

The metaphor of maps is also useful in explaining why ML expertise is not enough, how it is possible that AI experts disagree, and how a layperson can orient toward who to trust.

For example, Yann LeCun, vice president of Meta and head of AI research at Facebook. Facebook has one of the worst reputations among the big players in terms of approach to safety. LeCun is a great expert in machine learning, but he also claims that the problem is much smaller than we think because, after all, as humanity we can tame various intelligent systems, such as raising children or regulating corporations. When the Vice President of Meta says this, it fills me with more dread than optimism. If I were to take the child-rearing metaphor seriously, it's like believing we can raise alien children.

For me, VP of Meta drawing his confidence from humanity's ability to align corporations is a reason to worry, not source of confidence.


The aim here is to explain that when e.g. Yann LeCun makes confident claims about AI risk, these are not mostly based on his understanding of machine learning, but often on analogies with different systems the reader knows and is able to evaluate independently. 

The overall experience

I'm quite picky in who to talk to, refusing the majority of interview requests, but conditional on that, my experience so far was generally positive. In particular after the interest in AI exploded after the release of ChatGPT and GPT4, technical journalists became reasonably informed. There is no need to justify the plausibility of powerful AI systems anymore. Also the idea of AI risk is firmly in the Overton window, and privately, a decent fraction of the people I talked with admitted being concerned themselves. 

Some of the resulting artefacts in machine translation:

What to do in English

I'd be pretty excited if the AI alignment community was able to generate more people able to represent various views of the field publicly, in a manner which makes both the public and policymakers more informed. I think this is particularly good fit for people with some publicly legible affiliations, such as in academic roles or tech companies not participating in the race directly.

New Comment
8 comments, sorted by Click to highlight new comments since:
[-]Raemon3520

I found a lot of this intuitively compelling, but I was expecting from the opening that you'd go into "here's what happened as a result of me using this strategy", and it seemed to focus on "here mostly are particular tactics I used and why I believe from first principles that they make sense."

Did you have much info on how your strategy has played out?

Judging in an informal and biased way, I think some impact is in the public debate being marginally a bit more sane - but this is obviously hard to evaluate. 

To what extent more informed public debate can lead to better policy is to be seen; also, unfortunately, I would tend to glomarize over discussing the topic directly with policymakers. 

There are some more proximate impacts like we (ACS) are getting a steady stream of requests for collaboration or people wanting to work with us, but we basically don't have capacity to form more collaborations, and don't have capacity to absorb more people unless exceptionally self-guided. 

Great points about not wanting to summon the doom memeplex!

It sounds like your proposed narrative is not doom but disempowerment: humans could lose control of the future. An advantage of this narrative is that people often find it more plausible: many more scenarios lead to disempowerment than to outright doom.

I also personally use the disempowerment narrative because it feels more honest to me: my P(doom) is fairly low but my P(disempowerment) is substantial.

I’m curious though whether you’ve run into the same hurdle I have, namely that people already feel disempowered! They know that some humans somewhere have some power, but it’s not them. So the Davos types will lose control of the future? Many people express indifference or even perverse satisfaction at this outcome.

A positive narrative of empowerment could be much more potent, if only I knew how to craft it.

[-]Raemon1010

Just going off my intuitive read, I like what you had to say about the doom memeplex. It does match my intuitive sense that "summoning a memeplex is easier than controlling it," and there are a bunch of ways I expect the "we're all going to die" belief system to have some bad side effects that would be better to avoid if possible.

I also liked this, for sort of sidestepping the "maybe they'll keep us because they like us?" objection.

A colleague of mine describes it with the metaphor that we can find ourselves in the role of a herd of cows, whose fate is being determined by the farmer. And I don't think we want to get to that stage. While there are many interesting questions about what might happen to such a herd afterwards, I think they are distracting. We need to focus on not losing control of the AI 

EDITED: Deleted some stuff.

 


Have you talked to top level people in government, the business world or academia? What were their reactions like? How seriously did they take you? Like, have you seen people pushing for regulation in the Czech parliament or business people asking about liability for AI or academics seeming like they get what makes alignment hard or what?

 Several points that might counter balance some of your claims, and I hope make you think about the issue from new perspectives. 

"We know what's going on there at the micro level. We know how the systems learn."

We don't only know how those systems learn but what exactly they are learning. Lets say you take a photograph, you don't only know how each pixel is formed, you also know what exactly is that you are taking a picture of. You can't always predict how this or that specific pixel will end up, as you have lots of noise, but this doesn't mean you don't know what the picture represents. Asking a network designer - oh you didn't know exactly how the network reacts to this specific question, is like coming to a photographer and asking him the exact RGB of a very specific pixel. Such small details are impossible to know.

Networks are basically approximators of functions based on the dataset provided. In case of RL the networks are generalizing a reward function. All those cases are showing that we are trying to "capture a picture" of generalizing the provided data. You can always miss a spot here or there, and the networks might ignore some of the data because they are small for example. But in general, we know how resources are allocated inside the network to represent concepts in order to predict the data or rewards. We can "steer" the network to focus more on this or that aspect of its outputs by providing more data of the sort that we want and respond to the network weaknesses. 

"if you look at the evolution of mankind from the perspective of a chimpanzee or a mammoth...If a system is much more intelligent than I am, it can naturally develop ways to limit or threaten me that I can't even imagine."

I would suggest trying and avoid anthropomorphism (or properties of biological systems as a whole). Instead of trying to make parallels, I would suggest looking at some math - and see what we can promise about those systems. Let's take a chess engine - just to keep it neutral for a moment, although the chess moves that it provides are superhuman, that means we don't know how it came up with those moves, and we can't be explained why this particular chess move is good, and even though sometimes those networks will do subhuman moves too, generally speaking, the network is promised to be trained to provide the best chess moves. A way smarter than human chess engine, it will still do a task that it was trained on. At the moment the network becomes superhuman, it doesn't start to want to make some strange chess moves, that will be more fun to play, or seen more desirable by the network. It will just make the best chess moves. Why? Because we trained it on a reward function that was generalizing its winning chances and nothing else. The reward function of LLMs is to provide a response most desired by some group of humans using RLHF method. Even superhuman networks, are promised to be trained and converge to provide such responses. Humans while evolving, weren't promised by mathematical theorems to provide best actions to benefit chimpanzees or mammoths (or some group of them). So although you can be oblivious to a threat made by superhuman networks, you can be sure that with correct training procedure, the networks will give outputs to maximize a reward function, which in our case would be in alignment with some human collective (a pretty large collective, as small groups have limited resources to train the best networks). So although a general superhuman AGI, can't be promised to act in our interest, those LLMs as long as they are trained in a certain procedure with a certain data, can be promised to maximize a well being of some human group. I would say it's much more than chimpanzees had, when humans were evolving. 

"we'll get to a state where we don't understand what's going on with the world, and we won't be able to influence it. Or we'll get a sense of some unrealistic picture of the world in which we're happy and we won't complain, but we won't decide anything."

Just like in case with chess, I would prefer that a chess engine will make the decisions, because the engine is doing it much better than myself, regarding human prosperity and happiness, if I am promised by the creator, using math theorems and testing experience he gained during development, that those systems are made to optimize humanity well-being, and it will do it much better than any human - I will gladly give up my control to a system that understands much more than any human how to do that. If for those systems, providing ideas to a policy maker, is the equivalence of providing chess players with best chess moves, I see no reason to stick to human decisions, they will make way more mistakes. 

In case of humans there is a small possibility, that the networks will decide the value system, based on their own perception of human well-being, as they were trained by a small group of people, and ignore the wider range of well-being that is more nuanced to different people. But I don't think the current social structures are so nuanced too, so if some system has a chance to be more aligned with each individual, is not the current political system, but a superhuman network. 

"We can find ourselves in the role of a herd of cows, whose fate is being determined by the farmer."

Once again - the farmer is a biological entity and is not promised by any math theorem to act in the benefit of a herd. But if a herd could train an AI, that would be promised by math theorems, to act on their behalf in their favor, they will be in a better situation than without such an AI. 

I would argue that the amount of control can be settled, just like people settle the amount of control for their life with politicians and governments. I would also claim that the current political system is already such a herd situation, and we can do very little about it, while the current political decision making is more subhuman than even could be provided by the best and brightest of humans. So personally, I will feel much less of a herd, if the decisions would be made not by politicians but by some system, based on mathematical theorems and superhuman analysis of data, rather than elected officials. 

-----

Generally speaking, I see some amount of anthropomorphism in your claims, and you are somehow ignoring the mathematically established theorems, that promise convergence to a state of the networks that will be aligned with some value system provided to them, and those mathematical theorems holds for superhuman networks as well. 

I can sympathize with the fear of losing control, and once the systems would be that advanced that we don't understand their decisions at all, although most of them will work in our favor, I would be engaged in a discussion of making such decisions or not. For now, we have a great tool in our hands, that promises to solve a lot of our current problems as humanity, I would not throw this tool now, just because in the future we might lose control. As I said previously, I am willing to lose a lot of control to computers, for example when I need to make a complex calculation, I would prefer not to make the computation by hand, but to use a calculator. The exact amount of lost control to feel comfortable can be debated, I think I will belong to the camp that we should not let humans make almost any decisions, and let those systems, as long as we can ensure their alignment make most of the decisions for us. The amount of understanding we should have to allow this or that decision, should be an open question for a relatively far future. For now, we still have people dying from hunger, working in factories, air pollution and other climate change issues, people dying on roads in car accidents, and a lot of deceases that kill us, and most of us (80% worldwide) work in a meaningless jobs just for survival. As long as those problems are not solved, I see no reason to give up our chance to way smarter systems that can provide a set of decisions that will be able to solve all those problems, then we can discuss how much more control we want to give them or take some of it back at some point. And yes, I would agree we could lose control without noticing, and it could be a problematic issue in a long run. I would claim in our current situation, until pretty far advanced systems like say GPT10, we should not fear of losing control to those systems, instead we should be afraid of control we already lost to the current political system, and the control some decision makers have, and what they do with it, and generally the current problems the world has, over losing control to aligned superhuman networks, that give us paradise but we don't make decisions at all - which maybe even a good thing.

One of the best replies I’ve seen and calmed much of my fears about AI. My pushback is this. The things you list below as reasons to justify advancing AGI are either already solvable with narrow AI or not solution problems but implementation and alignment problems.

“dying from hunger, working in factories, air pollution and other climate change issues, people dying on roads in car accidents, and a lot of deceases that kill us, and most of us (80% worldwide) work in a meaningless jobs just for survival. “

Developing an intelligence that has 2-5x general human intelligence would need to have a much larger justification. Something like asteroids, unstoppable virus or sudden corrosion of atmosphere would justify the use of bringing out an equally existential technology like superhuman AGI.

What I can’t seem to wrap my head around is why a majority has not emerged that sees the imminent dangers in creating programs that are 5x generally smarter than us at everything. If you don’t fear this I would suggest anthropomorphizing more not less.

Regardless I think politics and regulations crush the AGI pursuits before GPT even launches its next iteration.

AGI enthusiasts have revealed their hand and so far the public has made it loud and clear that no one actually wants this, no one wants displacement or loss of their jobs no matter how sucky they might be. These black boxes scare the crap out of people and people don’t like what they don’t know.

Bad news and fear spreads rapidly in todays era, the world is run by Instagram moms more than anyone else. If it’s the will of the people, Google and Microsoft will find out just how much they are at the mercy of the “essential worker” class.

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?