HoldenKarnofsky

Thoughts on the Singularity Institute (SI)

This post presents thoughts on the Singularity Institute from Holden Karnofsky, Co-Executive Director of GiveWell. Note: Luke Muehlhauser, the Executive Director of the Singularity Institute, reviewed a draft of this post, and commented: "I do generally agree that your complaints are either correct (especially re: past organizational competence) or incorrect but not addressed by SI in clear argumentative writing (this includes the part on 'tool' AI). I am working to address both categories of issues." I take Luke's comment to be a significant mark in SI's favor, because it indicates an explicit recognition of the problems I raise, and thus increases my estimate of the likelihood that SI will work to address them. September 2012 update: responses have been posted by Luke and Eliezer (and I have responded in the comments of their posts). I have also added acknowledgements. The Singularity Institute (SI) is a charity that GiveWell has been repeatedly asked to evaluate. In the past, SI has been outside our scope (as we were focused on specific areas such as international aid). With GiveWell Labs we are open to any giving opportunity, no matter what form and what sector, but we still do not currently plan to recommend SI; given the amount of interest some of our audience has expressed, I feel it is important to explain why. Our views, of course, remain open to change. (Note: I am posting this only to Less Wrong, not to the GiveWell Blog, because I believe that everyone who would be interested in this post will see it here.) I am currently the GiveWell staff member who has put the most time and effort into engaging with and evaluating SI. Other GiveWell staff currently agree with my bottom-line view that we should not recommend SI, but this does not mean they have engaged with each of my specific arguments. Therefore, while the lack of recommendation of SI is something that GiveWell stands behind, the specific arguments in this post should be attributed only to me, not

323May 11, 2012

HoldenKarnofsky

Message

7466

487

105

16y

Responsible Scaling Policy v3

All views are my own, not Anthropic’s. This post assumes Anthropic’s announcement of RSP v3.0 as background. Today, Anthropic released its Responsible Scaling Policy 3.0. The official announcement discusses the high-level thinking behind it. This is a more detailed post giving my own takes on the update. First, the big...

Feb 24178

Sabotage Evaluations for Frontier Models

This is a linkpost for a new research paper from the Alignment Evaluations team at Anthropic and other researchers, introducing a new suite of evaluations of models' abilities to undermine measurement, oversight, and decision-making. Paper link. Abstract: > Sufficiently capable models could subvert human oversight and decision-making in important contexts....

Oct 18, 202495

Case studies on social-welfare-based standards in various industries

Last year, I posted a call for case studies on social-welfare-based standards for companies and products (including standards imposed by regulation). The goal was to gain general context on standards to inform work on possible standards and/or regulation for AI. This resulted[1] in several dozen case studies that I found...

Jun 20, 202442

Good job opportunities for helping with the most important century

Yes, this is my first post in almost a year. I’m no longer prioritizing this blog, but I will still occasionally post something. I wrote ~2 years ago that it was hard to point to concrete opportunities to help the most important century go well. That’s changing. There are a...

Jan 18, 202436

We're Not Ready: thoughts on "pausing" and responsible scaling policies

Views are my own, not Open Philanthropy’s. I am married to the President of Anthropic and have a financial interest in both Anthropic and OpenAI via my spouse. Over the last few months, I’ve spent a lot of my time trying to help out with efforts to get responsible scaling...

Oct 27, 2023200

3 levels of threat obfuscation

One of the biggest reasons alignment might be hard is what I’ll call threat obfuscation: various dynamics that might make it hard to measure/notice cases where an AI system has problematic misalignment (even when the AI system is in a controlled environment and one is looking for signs of misalignment)....

Aug 2, 202371

A Playbook for AI Risk Reduction (focused on misaligned AI)

I sometimes hear people asking: “What is the plan for avoiding a catastrophe from misaligned AI?” This post gives my working answer to that question - sort of. Rather than a plan, I tend to think of a playbook.1 * A plan connotes something like: “By default, we ~definitely fail....

Jun 6, 202390

Load More (7/50)

LESSWRONG
LW

LESSWRONG
LW

HoldenKarnofsky

HoldenKarnofsky

HoldenKarnofsky

Thoughts on the Singularity Institute (SI)

Discussion with Nate Soares on a key alignment difficulty

We're Not Ready: thoughts on "pausing" and responsible scaling policies

Responsible Scaling Policy v3

HoldenKarnofsky

Responsible Scaling Policy v3

Sabotage Evaluations for Frontier Models

Case studies on social-welfare-based standards in various industries

Good job opportunities for helping with the most important century

We're Not Ready: thoughts on "pausing" and responsible scaling policies

3 levels of threat obfuscation

A Playbook for AI Risk Reduction (focused on misaligned AI)

Responsible Scaling Policy v3

Sabotage Evaluations for Frontier Models

Case studies on social-welfare-based standards in various industries

Good job opportunities for helping with the most important century

We're Not Ready: thoughts on "pausing" and responsible scaling policies

3 levels of threat obfuscation

A Playbook for AI Risk Reduction (focused on misaligned AI)

Thoughts on the Singularity Institute (SI)

Discussion with Nate Soares on a key alignment difficulty

We're Not Ready: thoughts on "pausing" and responsible scaling policies

Responsible Scaling Policy v3