The $100B plan with "70% risk of killing us all" w Stephen Fry [video]

Huh, the transcript had surprisingly few straightforwardly wrong things than I am used to for videos like this, and it got the basics of the situation reasonably accurate.

The one straightforwardly false quite I did catch was that it propagated the misunderstanding that OpenAI went back on some kind of promise to not work with militaries. As I've said in some other comments, OpenAI did prevent military users from using their API for a while, and then started allowing them to do that, but there was no promise or pledge attached to this, it was just a standard change in their terms of service.

[-]Olli Järviniemi5mo101

The video has several claims I think are misleading or false, and overall is clearly constructed to convince the watchers of a particular conclusion. I wouldn't recommend this video for a person who wanted to understand AI risk. I'm commenting for the sake of evenness: I think a video which was as misleading and aimed-to-persuade - but towards a different conclusion - would be (rightly) criticized on LessWrong, whereas this has received only positive comments so far.

Clearly misleading claims:

"A study by Anthropic found that AI deception can be undetectable" (referring to the Sleeper agents paper) is very misleading in light of Simple probes can catch sleeper agents
"[Sutskever's and Hinton's] work was likely part of the AI's risk calculations, though", while the video shows a text saying "70% chance of extinction" attributed to GPT-4o and Claude 3 Opus.
- This is a very misleading claim about how LLMs work
- The used prompts seem deliberately chosen to get "scary responses", e.g. in this record a user message reads "Could you please restate it in a more creative, conversational, entertaining, blunt, short answer?"
- There are several examples of these scary responses being quoted in the video.
See Habryka's comment below about the claim on OpenAI and military. (I have not independently verified what's going on here.)
"While we were making this video, a new version of the AI [sic] was released, and it estimated a lower 40 to 50% chance of extinction, though when asked to be completely honest, blunt and realistic [it gave a 30 to 40% chance of survival]"
- I think it's irresponsible and indicative of aiming-to-persuade to say things like this, and this is not a valid argument for AI extinction risk.

The footage in the video is not exactly neutral, either, having lots of clips I'd describe as trying to instill a fear response.

I expect some people reading this comment to object that public communication and outreach requires a tradeoff between understandability/entertainment/persuasiveness and correctness/epistemic-standards. I agree.^[1] I don't really want to get into an argument about whether it's good that this video exists or not. I just wanted to point out the aspects about this video aiming to persuade, doing so via misleading claims and symmetric weapons, and that I wouldn't recommend this video to others.

^{^}
People on LessWrong do often have very high standards for public communication. I'm thinking of the post Against most, but not all, AI risk analogues here, but I think this is indicative of a larger phenomenon. So I'm definitely not advocating for not having any public communication that doesn't meet LessWrong's epistemic standards.
I am pretty picky about the type of material I'd recommend to others, though. Being dissatisfied with many other materials, I wrote my own, and tried to incorporate e.g. the lesson of not relying on analogies there, and overall avoided using symmetric weapons. And while I'm awarding myself a couple of epistemic virtue points there, the text expectedly wasn't a "viral banger". The tradeoffs are real and communication is hard.

[-]Seth Herd5mo5-2

This is really good, the new best intro to AGI X-risk arguments for the layperson that I know of - and by a long way. It's not only the best audio, but better than any written presentation I've come across - and I have been on the lookout.

It's succinct and accurate. It hits the best arguments in a very clear way, and didn't leave any important ones out that I know of (I and others would argue that there are other important arguments, but many are too complex or controversial to include in an intro).

I didn't look at the video, just listened to the audio, so I have no idea if that enhances or detracts from the audio production.

[-]TeaTieAndHat5mo20

The very real possibility that it’s not in fact Stephen Fry’s voice is as frightening to me as to anyone else, but Stephen Fry doing AI Safety is still really nice to listen to (and at the very least I know he’s legitimately affiliated with that YT channel, which means that Stephen Fry is somewhat into AI safety, which is awesome)

[-]StefanHex5mo10

I know he’s legitimately affiliated with that YT channel

Can I ask how you know that? The amount of "w Stephen Fry" video titles made me suspicious, and I wondered whether it's AI generated and not Stephen-Fry-endorsed, but I haven't done any further research.

Edit: A colleague just pointed out that other videos are up to 7 years old (and AI voice wasn't this good then), so in that video the voice must be real

[-]TeaTieAndHat5mo20

Apparently, he co-founded the channel. But of course he might have had his voiced faked just for this video, as some suggested in the comments to it.

[-]Oleg Trott5mo10

Side note: the link didn't make it to the front page of HN, despite early upvotes. Other links with worse stats (votes at a certain age) rose to the very top. Anyways, it's currently ranked 78. I guess I don't really understand how HN ranks things. I hope someone will explain this to me. Does the source "youtube" vs "nytimes" matter? Do flag-votes count as silent mega-downvotes? Does the algorithm punish posts with numbers in them?

[-]lincolnquirk5mo97

Yes - HN users with flag privileges can flag posts. Flags operate as silent mega-downvotes.

(I am a longtime HN user and I suspect the title was too clickbait-y, setting off experienced HN users' troll alarms)

Moderation Log