Paul Christiano on Dwarkesh Podcast

ESRogs

19 Paul Christiano on Dwarkesh Podcast

by ESRogs

3rd Nov 2023

1 min read

0

19

This is a linkpost for https://www.dwarkeshpatel.com/p/paul-christiano

Dwarkesh's summary:

Paul Christiano is the world’s leading AI safety researcher. My full episode with him is out!
We discuss:
Does he regret inventing RLHF, and is alignment necessarily dual-use?
Why he has relatively modest timelines (40% by 2040, 15% by 2030),
What do we want post-AGI world to look like (do we want to keep gods enslaved forever)?
Why he’s leading the push to get to labs develop responsible scaling policies, and what it would take to prevent an AI coup or bioweapon,
His current research into a new proof system, and how this could solve alignment by explaining model's behavior
and much more.

AI TimelinesAlignment Research Center (ARC)Deceptive AlignmentResponsible Scaling PoliciesRLHFWorld Modeling

Frontpage

19

New Comment

Moderation Log