Cross-posted from substack

AI has been a hot topic in recent Twitter discourse, with two opposing camps dominating the conversation: the Doomers and the AI builders. The Doomers, led by Eliezer Yudkowsky and other rationalists, advocate for caution and restraint in the development of AI, fearing that it could pose an existential threat to humanity. Prominent figures in this camp include Elon Musk, who has expressed concerns about the potential dangers of AI while also founding AI-focused companies like OpenAI and up-and-coming “BasedAI.” On the other side of the debate are the AI builders, including Yann LeCunn and Sam Altman, who are eager to push the boundaries of AI development and explore its full potential. While some members of this group have been dismissed as "idiot disaster monkeys" by Yudkowsky, I will refer to them as "Foomers" for the purposes of this blog post. The divide between these two camps is significant, as it represents a fundamental disagreement about the future of AI and its potential impact on society.

 

The debate around AI often centers on the concept of superintelligence, which refers to AI that surpasses human intelligence in every way. Doomers argue that superintelligence could pose an existential threat to humanity, as it would be capable of outsmarting humans and achieving its goals at any cost. This is particularly concerning given that the goals of such an AI would be difficult, if not impossible, to specify in advance. If the goals are misaligned with human values, the consequences could be catastrophic. The AI builders or "Foomers" tend to downplay these risks, arguing that superintelligence could be used for the benefit of humanity if developed and controlled properly. However, the Doomers counter that the risks are too great and that any attempt to control superintelligence is likely to fail. As such, the debate remains a contentious one, with both sides offering many arguments.

 

 

While Foomers may reject critique through thought experiments and argue for incremental improvement of AI through trial and error, there seems to be a lack of engagement from both sides in identifying the underlying assumptions and values that shape the debate. This can lead to the same discourse tiling Twitter with copies of itself without any meaningful progress. As a result, many people are left frustrated and exhausted by the debate. In my blog post, I aim to provide a fresh perspective on the debate and contribute to a more productive conversation. By analyzing the arguments of each side and exploring potential areas of common ground, I hope to help re-align the discourse in a more positive direction.

 

It's worth noting the curious fact of a pipeline between the Doomer and Foomer camps. Organizations like OpenAI and Anthropic started as "safety" organizations, but have since pivoted towards a more Foomer-like position. Similarly, Doomers have historically broken away from Kurzwelians, who were the original Foomers. While changing one's position based on new evidence is commendable, this two-way pipeline casts doubt on the strength of both positions. Alternating between two extremes suggests that neither side has a firm grasp on the crux of the debate. It's important to engage with opposing views and seek out potential areas of agreement, rather than simply oscillating between extremes.

 

So I decided to make my OWN POSITION in what I claim is a reasonable center. I have the following 10 beliefs:

 

1. Safe AGI is a small portion of the space of all AIs or all algorithms.

2. AI is dangerous, discontinuous jumps in capacity are particularly dangerous

3. We are unlikely to get a really fast takeoff

4. There will be warning shots and "smaller" AI failures to learn from.

5. AI-caused social and mental health issues are more likely than bio/nanotech

6. "Slowing down AI" can be good, but getting the government involved is not.

7. We can learn from empirical, simulations, and logical methods

8. A lot of existing techniques to make AI safer can be used for AGI.

9. Problems of civilization have analogs in AGI problems.

10. Humans must come first. Now and Forever.

 

 

Explanations:

1. Safe AGI is a small portion of the space of all AIs or all algorithms.

"Algorithms" is a large space, "AIs" is a large sub-space. Many people wish to ascribe some property X to all AIs when not even all humans have said property X. However the subset of AIs that are both powerful and ones we want to build is a small subset of all "powerful AIs." The analogy is that if you want to go to the nearest star system you'd are trying to hit a small target in space. That said, going to the nearest star system is hard, but not impossible.

2. AI is dangerous, discontinuous jumps in capacity are particularly dangerous

There is a particular doomer world view that I am sympathetic to and that is if a hugely powerful alien ship or AI appeared in the sky and had goals regarding the planet, there is nothing we would likely be able to do against a civilization vastly technologically superior to ours. However, the important part of this hypothetical is discontinuity. I think we are unlikely to get strong discontinuities in AI.

3. We are unlikely to get "really fast takeoff".

I wrote this a while ago. The TL;DR is that the AI improvement process is going to become less and less constrained by humans. The AI development loop is "people think for a little bit" and "fire off an AI to test their theory". Given that AI demands are growing in computing terms and theories are becoming complex, the "fire off an AI to test the theory" is becoming a larger portion of the loop gradually. So replacing people in the loop doesn't necessarily make the loop exponential in millisecond terms.

4. There will be warning shots and "smaller" AI failures to learn from.

Some examples of warning shots:

Some company uses a neural network to trade their portfolio and loses everything

Some company "accidentally" violates copyright by training AI and get sued for it.

Some people create an AI bot to try and make money online and it becomes a scammer (again lawsuits+prison for them)

Someone actually uses an AI to convince someone else to do something wildly illegal or hurtful

Someone builds a bad chemical and several people die as a result

 

I would consider these to be "small" warning shots that may or may not lead to people learning the right lessonssd. I think warning shots could get bigger before the lesson is fully learned, however it will be learned before "doom". For example, a complete socio-economic breakdown of a major country due to the financial system being exploited by bots and becoming unusable for people is a warning shot that is plausibly big enough for decision makers to start paying attention. A collapse of "an entire nation" is my guess at an upper limit of "warning" that is required for decision-makers to take AI seriously.

5. AI-caused social and mental health issues are more likely than bio/nanotech

I have written about plausible pathways AI will disrupt civilization at length here.

The general theme is that social manipulation, behavioral modification and scam-like behavior is far easier to do than new destructive bio-tech. Social media causing mental health problems for decades means this can be done using not-that intelligent algorithms. This is a near term concern as signals that were previously load-bearing for social function become polluted.

This is bad news for the near term trajectory of Western civilization and will lower the standard of living and counteract a lot of the near term benefits of AI. However this isn’t “doom”

6. "Slowing down AI" can be good, but getting the government involved is not.

Again, given the fact that we are going to have those warning shots, it may be worth mobilizing some of society’s existing resources to create learning about them. Calling a group of labs to voluntarily slow down as to we can understand the real power level of what models have already been created is a reasonable call.

However, where this starts getting unreasonable is to ask to get the government involved in either domestic or foreign policy through either local regulation or data-center “bombings.”

At this moment the US government displays a deep lack of state capacity in terms of addressing problems along with a desire to create new ones. It is no longer safe to ask the government to ban TikTok, let alone attempt to create new international agreements. The US government is no longer really perceived as agreement-capable by it’s geo-political competition.

A recent post Catching The Eye of Sauron the author argued that “not enough is being done” and that it doesn’t look like options are at all exhausted before drastic calls. I agree with most of the post and would also like to add that even an action such as speeding up lawsuit against relevant companies has not been explored much. Many people question both the copyright problems that are involved in training large generative models as well as potential for auto-mated libel. Lawyers may just be the heroes we deserve right now.

7. We can learn from empirical, simulations, and logical methods

This feels to me like one of the cruxes of the whole debate. If you want to learn about AI, how do you do it?

The Foomer position seems to be that you learn by empirical methods - run the AI and see what happens incrementally. The Doomer position seems to be that at some point incremental changes are “not so incremental” and will get people killed. However, the Doomer position also gives off the vibe that implementing current paradigms doesn’t teach us much or that knowledge can only be acquired through thought experiments.

In my view, all kinds of methods can bring us new valuable information on AI / people and how to make AI safe. The fact of Open AI spending a lot of resources on RL HF and people jailbreaking the AI anyways is an important piece of learning.

Thought experiments are a good start to learning about AI, however, if the thought experiment becomes complex enough for people to really start disagreeing, it's time to formalize it. First start with a mathematical formalization, then follow through with simulation in the smallest possible environment.

Other types of simulations that could be helpful are simulations in particular video games, specifically sandbox games. It's easier to tell what doesn’t work through this method than what does work. However, knowing 10 million things that don't work is extremely valuable.

8. A lot of existing techniques to make AI safer can be used for AGI.

This is my #1 problem with the Doomer worldview.

I am going to talk about a specific example, called inverse reinforcement learning. (or IRL). However, keep in mind this is one example and there are many others. IRL is used by Waymo among others to help guide self-driving cars. It is an example of a technology that is actively being developed on a fairly complex task and a lot of learnings about it can carry over to learning about more general tasks. While learning “values from behavior” perfectly may not happen because of some human deviation from optimality, this seems like a solvable problem. You can still learn how humans drivers handle the “not-run-into-things” problem through such techniques even if they get it wrong sometimes or disagree on questions of what is polite on the road. The book “Human Compatible” makes some arguments along the same lines.

If certain experiments with techniques like these seem too dangerous, then one can use simulations to refine them.

When I hear doomer talk about IRL, either here or here, the set of arguments used against it points to a pretty big philosophical confusion between cultural values (egalitarianism) vs fundamental values (non-kill-everyoneism.) as well as confusion around what the shape of human “irrationality” is. The argument that IRL can’t coherently learn cultural values may be true, but this isn’t the same thing as coherently learning fundamental values. So IRL gets a lot of negative feedback incorrectly while it may be a core technology of “not-kill-everyoneism". Building utopia may in fact be hard-to-impossible, however getting to AGI “not kill everyone” may be significantly easier. However if the public messaging is “we don’t know how to not kill everyone,” while the private research is more “we don’t know how to build utopia,” this is wildly irresponsible. Not to mention dangerous in that existing techniques refined on real-life tasks such as IRL are going to be unfairly critiqued.

 

9. Problems of civilization have analogs in AGI problems.

This is a very big topic. Problems in AI are both new but also have precedent or analogs in the past. However, a lot of problems have analogs. What utility function should AI have is a question analogous to questions of how to measure societal utility in economics. Economics also explores questions of how coherently one can model a human as a rational agent. There are questions of philosophy that deal with the nature of ethics, Beings, philosophy of language, etc

Now, just because these questions were previously considered, does not mean that they were solved. However, this fact points to the idea that questions of a lot of previous thinkers can be used to help understand future AGI and that lot of sub-problems can be fanned out to the outside world if framed and incentivized carefully.

 

10. Humans must come first. Now and Forever.

Parts 1-9 are a mix of predictions, heuristics, and general facts. 10 is a value statement that is here so that people don't lose sight of the big picture.

AIs, if they are to be built at all, they are meant to be built to help people do stuff. Whether it is economic productivity, helping one's well-being, or bringing one closer to other people, the AIs are always tools. Building AI is an instrumental goal and people are terminal goals and this should stay that way.

If an AI begins hurting people it's time to shut it down.

There is a lot of strangeness coming from both camps and from other people with even worse epistemic standards than either camp (I know that can be hard to believe). I don't want switcheroos, where people promise "prosperity" and instead society begins to be built "for AIs," rather than people. I don't want to build AIs that have consciousness, moral worth, are capable of suffering, etc, etc. I don't want uploads. Not a great fan of over-cyborgization either. It's possible some countries might allow the above, but I predict and hope many will not.

I want biological humans to live long lives and conquer the galaxy. Nothing more. Nothing less.itsprecedents,

New Comment
13 comments, sorted by Click to highlight new comments since:

Someone builds a bad chemical and several people die as a result

Let's hope it's a chemical and not a virus.

Re warning shots (#4): I worry that reality is already giving us lots of warning shots, and we're failing to learn much of anything from them. Like, rather than forming a generalization that AI is kind of weird and difficult to direct, so it could be a real problem if it gets super powerful, we're mostly just sort of muddling through, and whenever something bad happens, we just say "oh, well that didn't kill us, so let's just keep on going."

Just to pick on a few of your examples:

Some company "accidentally" violates copyright by training AI and get sued for it.

https://www.newscientist.com/article/2346217-microsofts-copilot-code-tool-faces-the-first-big-ai-copyright-lawsuit/

Someone actually uses an AI to convince someone else to do something wildly illegal or hurtful

https://www.businessinsider.com/widow-accuses-ai-chatbot-reason-husband-kill-himself-2023-4?op=1 (There was no nefarious human behind the AI in this case, so it should count double, no?)

If you'll permit me to anthropomorphize Reality for a minute, here's what it would be saying:

Wow, I've been so good about giving the humans loads of low-stakes warning shots to work from. I threw them that RL agent driving a speedboat in circles early on. In fact, I gave them so many examples of RL doing weird stuff that I'm sure it's common knowledge amongst RL researchers that you have to be careful about how you reward your agent. And humans communicate and tell each other things, so I'm sure by this point everybody knows about that.

Then when they started building really good language models, I made it so that putting the wrong input into them would make them reveal their prompt, or say racist things, or other really terrible stuff that's even worse. But I was concerned that wouldn't be an obvious enough sign, since those failures only crop up if the human user is trying to make them happen. So when GPT-4 showed up on the scene, as Bing, I took the opportunity to do one better: Bing sometimes spontaneously insulted, gaslit, and was generally a jerk to its users. without anybody deliberately prompting it to do that.

What a good and nice and friendly Reality I am, giving these humans so much advance warning about AI stuff. It must be like playing a game on easy mode.

I wrote a bit more which isn't too related to the rest of the comment, but I couldn't resist:

Me: Hey, Reality, sorry to jump in here, but let's say we've picked up the warning. What do we do then? How do we solve alignment?

Reality: You really don't have to stress about that, man. Once you've realized the danger based on the numerous helpful hints I've provided, you've got all the time in the world to figure out how to solve alignment. You can just not build AI while you're working on the problem. Don't worry about your bros dying, you can freeze all your peeps in liquid nitrogen and bring 'em back when you figure it out. I've arranged for there to already exist pre-existing expertise in doing exactly that, and pre-existing companies offering it as a service.

Me: Cool. When we do get around to trying to solve alignment, though, do you have any helpful tips on how to actually go about that?

Reality: I've got you covered, dude! See, in the platonic structure of mathematics, there's totally a perfectly consistent formulation of Updateless Decision Theory. It describes an ideal bounded rational agent, that handles embeddedness, cooperation, and logical uncertainty perfectly fine. Just keep working on that agent foundations research, you're closer than you think! Even the platonic structure of mathematics is on your side, see? We all want you to be successful here.

Me: Wow, that's great news! And then what do we do about value-loading?

Reality: Value-loading is even easier. You just write down your utility function, and put it in the agent you're building. Boom, done! Nothing could be simpler.

Me: So you're saying I just need to, uh, write down my utility function?....

Reality: Yep, exactly. You know all your dreams and goals, your love for your friends, family, and indeed humanity as a whole? Everything that is good and bright in the world, everything that you're trying to achieve in your life? You just write that all down as a big mathematical function, and then stick it in your artificial agent. Nothing could be simpler!

Me: Are there any alternatives to that? Say if I was trying to align the AI to someone else, whose utility function I didn't know? Not me obviously, I certainly know my own utility function (cough), just say I was trying to align the AI for a friend, as it were. (ahem)

Reality: You could ask them to write down their utility function and then stick it in the agent. There's this rad theorem that says that your friend should always be willing to do this: Given a few reasonable assumptions, creating a new agent that shares your own utility function should always be desirable to any given agent. Now your friend may not want you finding out their utility function, but there's cryptographic techniques that can be used for that...

Me: Yeah, I think I know about that theorem. However, for certain complicated reasons that won't work for m... my friend. I, uh, can't say why. Any other options?

Reality: Yes, there is one other option: See, one thing that would always work is you could always just copy the utility function directly out of their minds. CTL-C, CTL-V, boom, done! See, I even gave you the keyboard shortcuts to use! Wow, I can't believe how helpful I'm being right now!!!!...

I am mostly agreeing with you here, so I am not sure you understood my original point. Yes Reality is giving us things that for a set of reasonable people such as you and me should be warning shots. 

Since a lot of other people don't react to them, you might become pessimistic and extrapolate that NO warning shot is going to be good enough. However I posit that SOME warning shots are going to be good enough. An AI - driven bank run followed by an economic collapse is one example, but there could be others. Generally I expect that when warning shots reach "nation-level" socio-economic problems, people will pay attention. 

However, this will happen before doom.

Thanks for the reply, I think we do mostly agree here. Some points of disagreement might be that I'm not at all confident that we get a truly large scale warning shot before AI gets powerful enough to just go and kill everyone. Like I think the threshold for what would really get people paying attention is above "there is a financial disaster", I'm guessing it would actually take AI killing multiple people (outside of a self-driving context). That could totally happen before doom, but it could also totally fail to happen. We probably get a few warning shots that are at least bigger than all the ones we've had before, but I can't even predict that with much confidence.

Yes I think we understand each other. One thing to keep in mind is that different stakeholders in AI are NOT utilitarians, they have local incentives they individually care about. Given the fact that COVID didn't stop gain-of-function research, this means that getting EVERYONE to care would require a death toll larger than COVID. However, getting someone like CEO of google to care would "only" require a half - a - trillion dollar lawsuit against Microsoft for some issue relating to their AIs. 

And I generally expect those - types of warning shots to be pretty likely given how gun-ho the current approach is.

I think this is based on a misunderstanding of the state of the field. I think the majority of the people working full-time on the AGI safety problem are already between the doomers and the boomers.

I think it's easy to get the impression of two polarized camps by looking at the public reporting of the field, instead of the field itself. Which is a problem. The way media currently works is to present the most extreme positions from each side. Those excite the people that agree, and irritate the people that don't. Both garner views and engagement.

The truth of this and other issues are that a lot more people are somewhere in the middle. Some of them are even having nuanced discussions.

In this case, it's pretty easy to get an overview of the field by skimming the Alignment Forum site. That's where the professionals are writing.

I have skimmed the Alignment Forum side and read most of MIRI's work before 2015. While it's hard to know about the "majority of people," it does seem that the public reporting is around two polarized camps. However in this particular case, I don't think it's just the media. The public figures for both sides (EY and Yann Lecunn) seem pretty consistent with their messaging and talking past each other. 

Also if the majority of people in the field agree with the above, that's great news and also means that reasonable centrism needs to be more prominently signal-boosted.

On a more object level, as I linked in the post, I think the Alignment forum is pretty confused about value learning and the general promise of IRL to solve it. 

this seems like a major concern in and of itself.

The public figures are drawn from the most extreme positions. And Yudkowsky founded this field, so he's also legitimately the most desired speaker. But things have changed a lot since 2015.

Check out Paul Christiano, Alex Turner, and Steve Byrnes for different views that are neither doomer nor foomer.

I don't have a survey result handy, but the ones I vaguely remember put the p(doom) estimates from within the field at vastly lower than MIRI's 90%+.

I am also familiar with Paul Christiano, I think his arguments for slower, more continous take off are broadly on the right track as well. 

Given that the extreme positions have strong stake-outs on twitter, I am once again claiming that there needs to be a strong stake-out of the more reasonable centrism. This isn't the first post in this direction, there were ones before and there will be ones after. 

Just trying to keep this particular ball rolling. 

twitter is a toxicity machine and as a result I suspect that people who are much at all +reasonableness are avoiding it - certainly that's why I don't post much and try to avoid reading from my main feed, despite abstractly agreeing with you. that said, here's me, if it helps at all: https://twitter.com/lauren07102

Yes, I agree. I have ideas how to fix it as well, but I seriously doubt they will gain much traction 

If an AI begins hurting people it's time to shut it down.

 

Which values and who/what groups of people or people or nations will decide the boundary when 'hurting' people is actually hurting people enough to warrant action?

This is similar in its applicability within human laws, regulations, and punishments...