Retrospective on the 2022 Conjecture AI Discussions

Andrea_Miotti

3 min read

2022 Conjecture AI Discussions

90 Retrospective on the 2022 Conjecture AI Discussions

by Andrea_Miotti

24th Feb 2023

AI Alignment Forum

3 min read

90 Ω 40

At the end of 2022, following the success of the 2021 MIRI Conversations, Conjecture started a project to host discussions about AGI and alignment with key people in the field. The goal was simple: surface positions and disagreements, identify cruxes, and make these debates public whenever possible for collective benefit.

Given that people and organizations will have to coordinate to best navigate AI's increasing effects, this is the first, minimum-viable coordination step needed to start from. Coordination is impossible without at least common knowledge of various relevant actors' positions and models.

People sharing their beliefs, discussing them and making as much as possible of that public is strongly positive for a series of reasons.

First, beliefs expressed in public discussions count as micro-commitments or micro-predictions, and help keep the field honest and truth-seeking. When things are only discussed privately, humans tend to weasel around and take inconsistent positions over time, be it intentionally or involuntarily.

Second, commenters help debates progress faster by pointing out mistakes.

Third, public debates compound. Knowledge shared publicly leads to the next generation of arguments being more refined, and progress in public discourse.

We circulated a document about the project to various groups in the field, and invited people from OpenAI, DeepMind, Anthropic, Open Philanthropy, FTX Future Fund, ARC, and MIRI, as well as some independent researchers to participate in the discussions. We prioritized speaking to people at AGI labs, given that they are focused on building AGI capabilities.

The format of discussions was as follows:

A brief initial exchange with the participants to decide on the topics of discussion. By default, the discussion topic was “How hard is Alignment?”, since we've found we disagree with most people about this, and the reasons for it touch on many core cruxes about AI.
We held the discussion synchronously for ~120 minutes, in writing, each on a dedicated, private Slack channel.
We involved a moderator when possible. The moderator's role was to help participants identify and address their cruxes, move the conversation forward, and summarize points of contention.
We planned to publish cleaned up versions of the transcripts and summaries to Astral Codex Ten, LessWrong, and the EA Forum. Participants were given the opportunity to clarify positions and redact information they considered infohazards or PR risks, as well as veto publishing altogether. We included this clause specifically to address the concerns expressed by people at AI labs, who expected heavy scrutiny by leadership and communications teams on what they can state publicly.

People from ARC, DeepMind, and OpenAI, as well as one independent researcher agreed to participate. The two discussions with Paul Christiano and John Wentworth will be published shortly. One discussion with a person working at DeepMind is pending approval before publication. After a discussion with an OpenAI researcher took place, OpenAI strongly recommended to its employee to not publish, so we will not be publishing that discussion.

Most people we were in touch with were very interested in participating. However, after checking with their own organizations, many returned saying their organizations would not approve them sharing their positions publicly.

This was in spite of the extensive provisions we made to reduce downsides for them: making it possible to edit the transcript, veto publishing, strict comment moderation, and so on. We think organizations discouraging their employees from speaking openly about their views on AI risk is harmful, and we want to encourage more openness.

We are pausing the project for now, and we have mixed feelings about it. It cost a lot of time to organize and conduct, and we were disappointed to see resistance to having and publishing discussions. On the other hand, the participants and moderators did find them enjoyable and valuable. We expect that even the few discussions that we'll be able to publish will improve public discourse and understanding of cruxes.

We believe Conjecture's status at the time in the AI alignment field was not sufficient to get enough traction, but we encourage any high-status person to try and launch similar initiatives.

We'll be interested in running discussions like these again in the future if there's renewed interest, and we appreciate everyone involved in this round.

New to LessWrong?

Getting Started

FAQ

Library

Postmortems & RetrospectivesAI RiskAI

Frontpage

90 Ω 40

Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes

7 comments61 karma

Mentioned in

61Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes

12Critiques of prominent AI safety labs: Conjecture

Retrospective on the 2022 Conjecture AI Discussions

New Comment

5 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:53 PM

[-]Quinn2y4228

Ideally there would be an exceedingly high bar for strategic witholding of worldviews. I'd love some mechanism for sending downvotes to the orgs that veto'd their staff from participating! I'd love some way of socially pressuring these orgs into at least trying to convince us that they had really good reasons.

I'm pretty cynical: I assume nervous and uncalibrated shuffling of HR or legal counsel is more likely than actual defense against hazardous leakage of, say, capabilities hints.

[-]Rohin Shah2yΩ21365

[People at AI labs] expected heavy scrutiny by leadership and communications teams on what they can state publicly. [...] One discussion with a person working at DeepMind is pending approval before publication. [...] We think organizations discouraging their employees from speaking openly about their views on AI risk is harmful, and we want to encourage more openness.

(I'm the person in question.)

I just want to note that in the case of DeepMind:

I don't expect "heavy" scrutiny by leadership and communications teams (though it is not literally zero)
For the discussion with me, the ball is in the authors' court: the transcript needs to be cleaned up more. I haven't even sent it to the relevant people at DeepMind yet (and have said so to Andrea).
While DeepMind obviously cares about what I say, I think it is mostly inaccurate to say that DeepMind has discouraged me from speaking about my views on AI risk (as should be evident from my many comments on this forum).

(Nothing in the post contradicts what I'm saying here, but I'm worried that readers would get a mistaken impression from it.)

[-]RobertM2y98

Just as a data point, the impression I got with respect to DeepMind was that they'd approved the conversation (contra some other orgs, for which the post said otherwise) and the review was in progress.

[-]RobertM2y2213

We circulated a document about the project to various groups in the field, and invited people from OpenAI, DeepMind, Anthropic, Open Philanthropy, FTX Future Fund, ARC, and MIRI, as well as some independent researchers to participate in the discussions.

Is this a complete set of the organizations you reached out to? Because, given this...

People from ARC, DeepMind, and OpenAI, as well as one independent researcher agreed to participate.

Most people we were in touch with were very interested in participating. However, after checking with their own organizations, many returned saying their organizations would not approve them sharing their positions publicly.

...the implication is that, of the researchers you reached out to at {Anthropic, OpenPhil, FTX FF, MIRI}, those who expressed interest were unable to participate because their organization wouldn't let them share their (individual? organizational?) opinions publicly? This is quite surprising! Is there any more detail you can share here, such as the specific concerns expressed?

[-]Andrea_Miotti2y60

People from OpenPhil, FTX FF and MIRI were not interested in discussing at the time. We also talked with MIRI about moderating, but it didn't work out in the end.

People from Anthropic told us their organization is very strict on public communications, and very wary of PR risks, so they did not participate in the end.

In the post I over generalized to not go into full details.

Moderation Log