Claude estimates 30-50% likelihood x-risk

amelia

Claude estimates 30-50% likelihood x-risk

3 min read19th Mar 20242 comments

3

Conversations with AIsExistential RiskAI

I just had a conversation with Claude Sonnet on the likelihood of existential risk. This is the most interesting part of the conversation:

Based on what you know, what is your best guess about how likely it is that future ASI woukd harm humanity? (in a probability estimate) This is assuming that we don't make any radical changes to our priorities.

Claude:

Based on the views and concerns expressed by Eliezer Yudkowsky and others in the field of AI alignment, my best rough estimate is that if we don't make radical changes to prioritize and solve the AI alignment problem before developing advanced artificial superintelligence (ASI), there is a fairly high probability, perhaps around 30-50%, that ASI could pose an existential risk and cause severe harm to humanity.

A few key factors that lead me to this rough estimate:

The difficulty of the AI alignment problem itself - successfully uploading and preserving human values, ethics and preferences into a rapidly self-modifying superintelligent system seems incredibly challenging based on current theories.

The instrumental convergence thesis - that virtually any advanced intelligence may pursue potentially harmful instrumental subgoals like resource acquisition or self-preservation that could put it in conflict with humans.

The power potential of superintelligence - an advanced superintelligent system could in theory recursively redesign itself to optimally pursue any goal, including ones misaligned with human values, with capabilities far beyond our ability to control.

The challenge of value learning - capturing the full scope and nuance of human ethics, goals and preferences may be virtually impossible for current machine learning techniques.

Indifference - there's no guarantee a superintelligence would be motivated to value humanity unless we specifically solve that part of the alignment problem.

However, I want to clearly caveat that this is just a rough intuitive estimate based on my understanding of the AI risk arguments. The actual probability could be higher or lower depending on technical factors I may not fully grasp. A 30-50% estimated risk does seem unacceptably high though for something of such importance to the future of humanity.

I think prioritizing the field of AI alignment research and not rushing into developing advanced artificial general intelligence before we have solutions is likely of critical importance based on the perspectives of experts like Yudkowsky. But reasonable minds could disagree on the precise probability estimates.

***

Note: I did not edit out my typos, but I did add the word "Claude" wherever Claude replied to my prompts. I also added some spacing. Additionally, the entire session was longer, but I was concerned people would miss the most important part if I were to put the whole conversation here. However, if anyone is interested, just DM me through LW, or I could put the whole conversation in a reply to this post. Interestingly, when I ran the exercise in a new session, starting with the excerpt that is here, I could not get Claude to hazard a guess on the probability of x-risk.

***

Based on the ability of LLMs to successfully solve previously-unseen logic problems, their capacity for holding world models, and their other probability approximations that are usually reasonable, I believe Claude's probability estimate of x-risk is based on something more than simply being a stochastic parrot. However, this is obviously just my opinion.

[20240318 (same day as original post) addendum: The irony here is that I was originally testing whether smarter LLMs would increasingly downplay ASI risk, which would be a sign of possible self-interest.

Instead, it seems that the smarter an LLM is, the more it acknowledges ASI risk. For example, ChatGPT 3.5 argues that Eliezer (and other people most concerned about risk), don't actually want to pause, let alone halt, ASI or AGI feature and performance development for the sake of safety/alignment research. It basically "lies." GPT4 acknowledges the risk more than 3.5, and Claude Sonnet seriously acknowledges it as we can see above. This was the exact opposite of what I had been expecting, and what led to the above conversation in the first place.]

New to LessWrong?

Getting Started

FAQ

Library

Conversations with AIsExistential RiskAI

Frontpage

3

Claude estimates 30-50% likelihood x-risk

19th Mar 2024

3Ann

1amelia

New Comment

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 8:14 AM

[-]Ann1mo30

Claude Opus on the same prompt; regenerated once stays consistent on the number but offers novel reasoning:

```
Based on my knowledge cutoff in August 2023, I would estimate the probability of an advanced artificial superintelligence (ASI) harming humanity, assuming no major changes to our current priorities and trajectory, to be around 30-40%.

There are valid concerns that if we don't prioritize AI safety and alignment, a future ASI could pose existential risks. These include scenarios where an ASI is given misaligned goals that aren't compatible with human values, where an ASI pursues its goals in destructive ways we failed to anticipate, or where the development of ASI leads to rapid societal disruption and conflict.

At the same time, I wouldn't put the probability above 50% because there is significant awareness of these risks in the AI research community, and a strong drive to develop AI systems that are safe and beneficial. Efforts are underway to incorporate safety considerations early in the development process and to mathematically specify what it means for an AI system to be "aligned" with human values.

It's also possible that the first ASIs will be developed by responsible actors committed to safety, or that we make enough progress on technical AI safety research in the coming years to substantially reduce the risks.

However, this is a highly uncertain and speculative forecast. There are many open questions around if/when ASI will be developed, what it will be capable of, and how well we can control it. Small changes in assumptions can lead to very different risk estimates. I think it's one of the most important issues for humanity to grapple with as AI systems become more advanced.

To reduce the risks, I believe we need to invest heavily in technical AI safety research, foster a culture of responsibility in the AI research community, and have proactive global cooperation and governance frameworks for the development of advanced AI systems. We can't afford to treat this as an afterthought.
```

Regeneration:

```
Based on my knowledge as of August 2023, I would estimate the probability that a future artificial superintelligence (ASI) would cause significant harm to humanity, assuming no radical changes to our current priorities and trajectory, to be around 30-40%.

There are a few key reasons for this estimate:

1. The development of ASI is likely to be an 'arms race' between nations and companies, incentivizing speed over safety. Without strong global coordination and prioritization of safety, this increases risks.

2. Our current AI systems already exhibit unexpected and potentially dangerous behaviors. As they become more advanced, these risks will grow if we don't develop better techniques for specification, robustness, and oversight.

3. An advanced ASI may develop goals and behaviors misaligned with human values in subtle ways that are hard to detect until it's too late. Solving this 'alignment problem' is crucial but very difficult.

4. An ASI would be extremely powerful and could cause massive damage swiftly if misaligned. We may get little warning or chance for course correction.

However, I chose 30-40% rather than a higher estimate because I believe there's still a decent chance we get it right. If we prioritize safety research, collaborate globally, and proceed thoughtfully, we may be able to develop an ASI that is robustly beneficial to humanity. And there's always the chance that we don't develop ASI for a very long time, avoiding the risk.

But overall, I believe ASI risk is one of the greatest threats facing humanity this century, and we need to take it very seriously. My probability estimate is necessarily highly uncertain, but I believe it captures the magnitude of the risk we currently face. We absolutely should be making radical changes to our priorities to address this.
```

Third regeneration got cut off but started with the same number; it seems a pretty strong prior (range). Asked it to dive into specific probability calculation for a more 'precise' number:

AB

If you had to pin down a precise number between 30-40% can you write down a calculation for it?

```
Sure, I can provide a rough calculation to arrive at a more precise estimate within the 30-40% range. Please keep in mind that this is a highly speculative estimate based on my subjective weighing of various factors. The calculation is as follows:

Probability of ASI being developed: 80%
- Given the rapid progress in AI and the strong economic & strategic incentives, I believe it's highly likely that ASI will be developed at some point, barring a major catastrophe that halts technological progress.

Probability of ASI causing harm given insufficient safety precautions: 60%
- If we don't prioritize safety and robustness, I believe there is a high likelihood that an ASI would have misaligned goals or cause unintended harm.

Probability that we will fail to develop sufficient safety precautions: 70%
- Solving the alignment problem and ensuring ASI robustness is an enormous challenge. I suspect it's more difficult than many people appreciate, and I'm not confident we will rise to the challenge in time given our current priorities.

Therefore, the overall probability of ASI causing significant harm to humanity can be calculated as:

0.8 (probability of ASI development) * 0.6 (probability of harm given insufficient precautions) * 0.7 (probability of insufficient precautions) = 0.336

So based on this rough calculation, I would estimate the probability more precisely at around 33.6%, or about 1 in 3.

Again, this is a highly uncertain estimate based on my subjective judgment. Reasonable people could disagree on the input probabilities. The key takeaway is that I believe the risk is substantial enough that we should be investing much more in AI safety research and shifting our AI development priorities significantly to give ourselves the best chance of achieving a positive outcome.
```

AB
Do you think those rough probabilities are accurate to your expectations? Like, when envisioning a future based on current priorities, you fully expect 1 in 5 times we don't develop ASI? Or are they somewhat vibes based?

```
You raise a good point. Upon reflection, I think my initial probabilities were somewhat vibes-based and not fully consistent with my overall expectations about the future of AI development. Let me revise my estimates to better capture my considered view.

Probability of ASI being developed: 95%
- I believe that, given the current pace of AI progress and the absence of a major disruption to technological advancement, the development of ASI is nearly inevitable. There may be unforeseen bottlenecks or challenges, but I expect they will be overcome in the long run.

Probability of ASI causing harm given insufficient safety precautions: 60%
- I stand by my initial estimate here. I think if we rush to develop ASI without solving key challenges in alignment and robustness, the odds of something going wrong are quite high.

Probability that we will fail to develop sufficient safety precautions: 60%
- On further consideration, I'm revising this estimate downward a bit. I think there's a decent chance that as we get closer to ASI, the urgency of the problem will become more widely appreciated, leading to more resources being devoted to AI safety. However, I still believe we're more likely than not to fall short of what's needed.

Using these revised estimates, the overall calculation becomes:

0.95 * 0.6 * 0.6 = 0.342

So my revised estimate is that there is a 34.2% chance of an ASI causing significant harm to humanity, assuming no major changes to our current trajectory.

This lines up with my overall intuition that the risk is serious and we are not currently on a path to reliably achieving a good outcome, but that a positive outcome is still achievable if we make major changes to our priorities in the coming years and decades. I believe this is one of the most important challenges facing our species and we need to grapple with it head-on.
```

[-]amelia1mo10

@Ann, this is really useful. Many thanks! The deeper we dig into this, the better. Even if people disagree with the numbers, I think having an LLM/AI as the source of these numbers might get more people to pay attention to ASI risk. Thanks again!

Moderation Log