paulfchristiano

Matrix completion prize results

Earlier this year ARC posted a prize for two matrix completion problems. We received a number of submissions we considered useful, but not any complete solutions. We are closing the contest and awarding the following partial prizes: * $500 to Elad Hazan for solving a related problem and pointing us...

Dec 20, 202345

Thoughts on responsible scaling policies and regulation

I am excited about AI developers implementing responsible scaling policies; I’ve recently been spending time refining this idea and advocating for it. Most people I talk to are excited about RSPs, but there is also some uncertainty and pushback about how they relate to regulation. In this post I’ll explain...

Oct 24, 2023220

Thoughts on sharing information about language model capabilities

Core claim I believe that sharing information about the capabilities and limits of existing ML systems, and especially language model agents, significantly reduces risks from powerful AI—despite the fact that such information may increase the amount or quality of investment in ML generally (or in LM agents in particular). Concretely,...

Jul 31, 2023211

Self-driving car bets

This month I lost a bunch of bets. Back in early 2016 I bet at even odds that self-driving ride sharing would be available in 10 US cities by July 2023. Then I made similar bets a dozen times because everyone disagreed with me. The first deployment to potentially meet...

Jul 29, 2023237

ARC is hiring theoretical researchers

The Alignment Research Center’s Theory team is starting a new hiring round for researchers with a theoretical background. Please apply here. Update January 2024: we have paused hiring and expect to reopen in the second half of 2024. We are open to expressions of interest but do not plan to...

Jun 12, 2023126

Prizes for matrix completion problems

Here are two self-contained algorithmic questions that have come up in our research. We're offering a bounty of $5k for a solution to either of them—either an algorithm, or a lower bound under any hardness assumption that has appeared in the literature. Question 1 (existence of PSD completions): given m...

May 3, 2023164

My views on “doom”

I’m often asked: “what’s the probability of a really bad outcome from AI?” There are many different versions of that question with different answers. In this post I’ll try to answer a bunch of versions of this question all in one place. TWO DISTINCTIONS Two distinctions often lead to confusion...

Apr 27, 2023253

paulfchristiano

paulfchristiano

Where I agree and disagree with Eliezer

What failure looks like

AI alignment is distinct from its near-term applications

Thoughts on the impact of RLHF research

paulfchristiano

Where I agree and disagree with Eliezer

What failure looks like

AI alignment is distinct from its near-term applications

Thoughts on the impact of RLHF research

Matrix completion prize results

Thoughts on responsible scaling policies and regulation

Thoughts on sharing information about language model capabilities

Self-driving car bets

ARC is hiring theoretical researchers

Prizes for matrix completion problems

My views on “doom”