Did people say why they deferred to these people?

No, only asked respondents to give names

I think another interesting question to correlate this would be "If you believe AI x-risk is a severely important issue, what year did you come to believe that?".

Agree, that would have been interesting to ask

Replying toDeference on AI timelines: survey results

Sam Clarke3y

Deference on AI timelines: survey results

Things that surprised me about the results

There’s more variety than I expected in the group of people who are deferred to
- I suspect that some of the people in the “everyone else” cluster defer to people in one of the other clusters—in which case there is more deference happening than these results suggest.
There were more “inside view” responses than I expected (maybe partly because people who have inside views were incentivised to respond, because it’s cool to say you have inside views or something). Might be interesting to think about whether it’s good (on the community level) for this number of people to have inside views on this topic.
Metaculus was given less weight

Sam Clarke

Sam Clarke, mccaffary

Crossposted to the EA Forum.

In October 2022, 91 EA Forum/LessWrong users answered the AI timelines deference survey. This post summarises the results.

Context

The survey was advertised in this forum post, and anyone could respond. Respondents were asked to whom they defer most, second-most and third-most, on AI timelines. You can see the survey here.

Results

This spreadsheet has the raw anonymised survey results. Here are some plots which try to summarise them.^[1]

Simply tallying up the number of times that each person is deferred to:

The plot only features people who were deferred to by at least two respondents.^[2]

Some basic observations:

Overall, respondents defer most frequently to themselves—i.e. their “inside view” or independent impression—and Ajeya Cotra. These two responses were each at least

... (read 512 more words →)

Replying toWhen reporting AI timelines, be clear who you're deferring to

Sam Clarke3y

When reporting AI timelines, be clear who you're deferring to

Sorry for late, will be out this month!

When reporting AI timelines, be clear who you're deferring to

Sam Clarke

It's fashionable these days to ask people about their AI timelines. And it's fashionable to have things to say in response.

But relative to the number of people who report their timelines, I suspect that only a small fraction have put in the effort to form independent impressions about them. And, when asked about their timelines, I don't often hear people also reporting how they arrived at their views.

If this is true, then I suspect everyone is updating on everyone else's views as if they were independent impressions, when in fact all our knowledge about timelines stems from the same (e.g.) ten people.

This could have several worrying effects:

People's timelines being overconfident (i.e. too resilient),

... (read 204 more words →)

Replying toWill Capabilities Generalise More?

Sam Clarke4y

Will Capabilities Generalise More?

Just wanted to say this is the single most useful thing I've read for improving my understanding of alignment difficulty. Thanks for taking the time to write it!

Replying toInner Alignment: Explain like I'm 12 Edition

Sam Clarke4y

Inner Alignment: Explain like I'm 12 Edition

Part of me thinks: I was trying to push on whether it has a world model or rather has just memorised loads of stuff on the internet and learned a bunch of heuristics for how to produce compelling internet-like text. For me, "world model" evokes some object that has a map-territory relationship with the world. It's not clear to me that GPT-3 has that.

Another part of me thinks: I'm confused. It seems just as reasonable to claim that it obviously has a world model that's just not very smart. I'm probably using bad concepts and should think about this more.

Replying toInner Alignment: Explain like I'm 12 Edition

Sam Clarke4y

Inner Alignment: Explain like I'm 12 Edition

It looks good to me!

This is already true for GPT-3

Idk, maybe...?

Replying toInner Alignment: Explain like I'm 12 Edition

Sam Clarke4y

Inner Alignment: Explain like I'm 12 Edition

Re the argument for "Why internalization might be difficult", I asked Evan Hubinger for his take on your rendition of the argument, and he thinks it's not right.

Rather, the argument that Risks from Learned Optimization makes that internalization would be difficult is that:

~all models with good performance on a diverse training set probably have to have a complex world model already, which likely includes a model of the base objective,
so having the base objective re-encoded in a separate part of the model that represents its objective is just a waste of space/complexity.

Especially since this post is now (rightly!) cited in several introductory AI risk syllabi, it might be worth correcting this, if you agree it's an error.

Replying toInner Alignment: Explain like I'm 12 Edition

Sam Clarke4y

Inner Alignment: Explain like I'm 12 Edition

Edit: or do you just mean that even though you take the same steps, the two feel different because retreating =/= going further along the wall

Yeah, this — I now see what you were getting at!

Replying toLate 2021 MIRI Conversations: AMA / Discussion

Sam Clarke4y

Late 2021 MIRI Conversations: AMA / Discussion

One argument for alignment difficulty is that corrigibility is "anti-natural" in a certain sense. I've tried to write out my understanding of this argument, and would be curious if anyone could add or improve anything about it.

I'd be equally interested in any attempts at succinctly stating other arguments for/against alignment difficulty.

I sometimes want to point people towards a very short, clear summary of What failure looks like, which doesn't seem to exist, so here's my attempt.

Many agentic AI systems gradually increase in intelligence and generality, and are deployed increasingly widely across society to do important tasks (e.g., law enforcement, running companies, manufacturing and logistics).
Initially, this world looks great from a human perspective, and most people are much richer than they are today.
But things then go badly in one of two ways (or more likely, a combination of both).
[Part 1] Going out with a whimper
- In the training process, we used easily-measurable proxy goals as objective functions, that don’t push the AI systems to

... (read more)

Sam Clarke's Shortform

Sam Clarke

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

Collection of arguments to expect (outer and inner) alignment failure?

Sam Clarke

Various arguments have been made for why advanced AI systems will plausibly not have the goals their operators intended them to have (due to either outer or inner alignment failure).

I would really like a distilled collection of the strongest arguments.

Does anyone know if this has been done?

If not, I might try to make it. So, any replies pointing me to resources with arguments that I've missed (in my own answer) would also be much appreciated!

Clarification: I'm most interested in arguments that alignment failure is plausible, rather than merely that it is possible (there are already examples that establish the possibility of outer and inner alignment failure for current ML systems, which probably implies we can't rule it out for more advanced versions of these systems either).

Distinguishing AI takeover scenarios

Sam Clarke

Sam Clarke, Sammy Martin

Epistemic status: lots of this involves interpreting/categorising other people’s scenarios, and could be wrong. We’d really appreciate being corrected if so. [ETA: so far, no corrections.]

TLDR: see the summary table.

In the last few years, people have proposed various AI takeover scenarios. We think this type of scenario building is great, since there are now more concrete ideas of what AI takeover could realistically look like. That said, we have been confused for a while about how the different scenarios relate to each other and what different assumptions they make. This post might be helpful for anyone who has similar confusions.

We focus on explaining the differences between seven prominent scenarios: the ‘Brain-in-a-box’ scenario,... (read 4185 more words →)

Survey on AI existential risk scenarios

Sam Clarke

Sam Clarke, apc, Jonas Schuett

Cross-posted to the EA forum.

Summary

In August 2020, we conducted an online survey of prominent AI safety and governance researchers. You can see a copy of the survey at this link.^[1]
We sent the survey to 135 researchers at leading AI safety/governance research organisations (including AI Impacts, CHAI, CLR, CSER, CSET, FHI, FLI, GCRI, MILA, MIRI, Open Philanthropy and PAI) and a number of independent researchers. We received 75 responses, a response rate of 56%.
The survey aimed to identify which AI existential risk scenarios^[2] (which we will refer to simply as “risk scenarios”) those researchers find most likely, in order to (1) help with prioritising future work on exploring AI risk scenarios, and (2)

... (read 1971 more words →)

What are the biggest current impacts of AI?

Sam Clarke

I'd like to have a clearer picture of the domains in which AI systems have already been deployed - particularly those in which they are having the largest impacts on the world, and what those impacts are.

Some reasons why this might be useful:

AI systems don't (currently) get deployed on large scales overnight (due to things like caution, bureaucracy, manufacturing bottlenecks). And AI systems that are highly useful in particular domains don't (currently) get developed overnight. So, all else being equal, the domains in which AI systems are being deployed today seem more likely to be the first domains in which more powerful versions of those systems begin to have large scale/transformative impacts.

... (read 243 more words →)

Clarifying “What failure looks like”

Sam Clarke

Thanks to Jess Whittlestone, Daniel Eth, Shahar Avin, Rose Hadshar, Eliana Lorch, Alexis Carlier, Flo Dorner, Kwan Yee Ng, Lewis Hammond, Phil Trammell and Jenny Xiao for valuable conversations, feedback and other support. I am especially grateful to Jess Whittlestone for long conversations and detailed feedback on drafts, and her guidance on which threads to pursue and how to frame this post. All errors are my own.

Epistemic status: My Best Guess

Epistemic effort: ~70 hours of focused work (mostly during FHI’s summer research fellowship), talked to ~10 people.

Introduction

“What failure looks like” is the one of the most comprehensive pictures of what failure to solve the AI alignment problem looks like, in worlds... (read 4974 more words →)

LESSWRONG
LW

LESSWRONG
LW

Sam Clarke

Clarifying “What failure looks like”

Distinguishing AI takeover scenarios

Survey on AI existential risk scenarios

When reporting AI timelines, be clear who you're deferring to

Sam Clarke

Deference on AI timelines: survey results

When reporting AI timelines, be clear who you're deferring to

Sam Clarke's Shortform

Collection of arguments to expect (outer and inner) alignment failure?

Distinguishing AI takeover scenarios

Survey on AI existential risk scenarios

What are the biggest current impacts of AI?

Sam Clarke

Clarifying “What failure looks like”

Distinguishing AI takeover scenarios

Survey on AI existential risk scenarios

When reporting AI timelines, be clear who you're deferring to

Sam Clarke

Deference on AI timelines: survey results

When reporting AI timelines, be clear who you're deferring to

Sam Clarke's Shortform

Collection of arguments to expect (outer and inner) alignment failure?

Distinguishing AI takeover scenarios

Survey on AI existential risk scenarios

What are the biggest current impacts of AI?

Context

Results

Summary

Introduction