LESSWRONG
LW

All of Swimmer963 (Miranda Dixon-Luinenburg) 's Comments + Replies

Swimmer963 (Miranda Dixon-Luinenburg) 2y52

I do think it's fair to consider the work on GPT-3 a failure of judgement and a bad sign about Dario's commitment to alignment, even if at the time (also based on LinkedIn) it sounds like he was also still leading other teams focused on safety research.

(I've separately heard rumors that Dario and the others left because of disagreements with OpenAI leadership over how much to prioritize safety, and maybe partly related to how OpenAI handled the GPT-3 release, but this is definitely in the domain of hearsay and I don't think anything has been shared publicly about it.)

My understanding of Anthropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) 2y40

Edited first line, which hopefully clarifies this better.

2[anonymous]2y

It does! I think I'd make it more explicit, though, that the post focuses on the views/opinions of people at Anthropic. Maybe something like this (new text in bold):

My understanding of Anthropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) 2y40

It's deliberate that this post covers mostly specifics that I learned from Anthropic staff, and further speculation is going to be in a separate later post. I wanted to make a really clear distinction between "these are things that were said to me about Anthropic by people who have context" (which is, for the most part, people in favor of Anthropic's strategy), and my own personal interpretation and opinion on whether Anthropic's work is net positive, which is filtered through my worldview and which I think most people at Anthropic would disagree with.

Part two is more critical, which means I want to write about it with a lot of effort and care, so I expect I'll put it up in a week or two.

2[anonymous]2y

+1. I think this framing is more accurate than the current first paragraph (which, in my reading of it, seems to promise a more balanced and comprehensive analysis).

My understanding of Anthropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) 2y71

My sense is that it's been somewhere in between – on some occasions staff have brought up doubts, and the team did delay a decision until they were addressed, but it's hard to judge how much the end result was a different decision from what would have been made otherwise, versus just happening later.

The sense I've gotten of the culture is compatible with (current) Anthropic being a company that would change their entire strategic direction if staff started coming in with credible arguments that "what if we shouldn't be advancing capabilities?", but I... (read more)

My understanding of Anthropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) 2y144

Your summary seems fine!

Why do you need to do all of this on current models? I can see arguments for this, for instance, perhaps certain behaviors emerge in large models that aren’t present in smaller ones.

I think that Anthropic's current work on RL from AI Feedback (RLAIF) and Constitutional AI is based on large models exhibiting behaviors that don't work in smaller models? (But it'd be neat if someone more knowledgeable than me wanted to chime in on this!)

My current best understanding is that running state of the art models is expensive in te... (read more)