Zach Stein-Perlman

AI strategy & governance. ailabwatch.org. Looking for new projects.

As of late May 2024, I'm focusing on blogging. In June I expect to focus on exploring a version of ailabwatch.org that could get more attention. I'm most excited to receive offers to help with projects like ailabwatch.org. I'm also excited to be pitched blogposts/projects.

Sequences

Slowing AI

Wiki Contributions

Load More

Comments

Labs should give deeper model access to independent safety researchers (to boost their research)

Sharing deeper access helps safety researchers who work with frontier models, obviously.

Some kinds of deep model access:

  1. Helpful-only version
  2. Fine-tuning permission
  3. Activations and logits access
  4. [speculative] Interpretability researchers send code to the lab; the lab runs the code on the model; the lab sends back the results

See Shevlane 2022 and Bucknall and Trager 2023.

A lab is disincentivized from sharing deep model access because it doesn't want headlines about how researchers got its model to do scary things.

It has been suggested that labs are also disincentivized from sharing because they want safety researchers to want to work at the labs and sharing model access with independent researchers make those researchers not need to work at the lab. I'm skeptical that this is real/nontrivial.

Labs should limit some kinds of access to avoid effectively leaking model weights. But sharing limited access with a moderate number of safety researchers seems very consistent with keeping control of the model.

This post is not about sharing with independent auditors to assess risks from a particular model.

@Buck suggested I write about this but I don't have much to say about it. If you have takes—on the object level or on what to say in a blogpost on this topic—please let me know.

Update:

The LTBT, whose members have no equity in the company, currently elects one out of the board’s five members. But that number will rise to two out of five this July, and then to three out of five this November.

This is encouraging and makes me not care anymore about seeing the "milestones." My concerns about investors' power over the Trust remain.

Also:

The LTBT’s first five members were picked by Anthropic’s executives for their expertise in three fields that the company’s co-founders felt were important to its mission: AI safety, national security, and social enterprise. Among those selected were Jason Matheny, CEO of the RAND corporation, Kanika Bahl, CEO of development nonprofit Evidence Action, and AI safety researcher Paul Christiano. [The other two were Neil Buddy Shah of the Clinton Health Access Initiative and formerly GiveWell and Zach Robinson of CEA and EV] (Christiano resigned from the LTBT prior to taking a new role in April leading the U.S. government’s new AI Safety Institute, he said in an email. His seat has yet to be filled.)

From this we can infer that the other four Trustees remain on the Trust, which is weak good news. [Edit: nope, Matheny left months ago due to potential (appearance of) conflict of interest. It's odd that this article doesn't mention that. As of May 31, Christiano and Matheny have not yet been replaced. It is maybe quite concerning if they—the two AI safety experts—are gone and not replaced by AI safety experts, and the Trust is putting non-AI-safety people on the board. Also I'm disappointed that Anthropic didn't cause me to know this before.]

Also:

Amazon and Google, he says, do not own voting shares in Anthropic, meaning they cannot elect board members and their votes would not be counted in any supermajority required to rewrite the rules governing the LTBT. (Holders of Anthropic’s Series B stock, much of which was initially bought by the defunct cryptocurrency exchange FTX, also do not have voting rights, Israel says.) 

Google and Amazon each own less than 15% of Anthropic, according to a person familiar with the matter.

So then the question is: who does own voting shares; how are voting shares distributed? and how can this change in the future?

(Also we just got two more examples of Anthropic taking credit for the Trust, and one of the articles even incorrectly says "power ultimately lies with a small, unaccountable group.")

What are the best things—or some good things—MIRI comms has done or published in 2024?

I bet the timing is a coincidence or due to internal questions/pressure, not PR concerns. Regardless I should ask someone at Anthropic how this post was received within Anthropic.

The plan was for the Trust to elect a fifth board member and also eventually replace Luke and Daniela. I totally believe Anthropic that Luke's departure was unrelated to Jay's arrival and generally non-suspicious. [Edit: but I do wish he'd been replaced with a safety-focused board member. My weak impression is that OP has the right to fill that seat until the Trust does; probably OP wants to distance itself from Anthropic but just giving up a board seat seems like a bad call.]

Possibly he didn’t just mean technically difficult. And possibly Politico took this out of context. But I agree this quote is bad and clarification would be nice.

Update:

Jay Kreps, co-founder and CEO of Confluent, has joined Anthropic's Board of Directors. . . . Jay was appointed to the board by Anthropic's Long-Term Benefit Trust. . . . Separately, Luke Muehlhauser has decided to step down from his Board role to focus on his work at Open Philanthropy.

I'm glad that the Trust elected a board member.

I still really want to know whether this happened on schedule.

I'm interested in what will happen to Luke's seat — my guess is that the Trust's next appointment will fill it.

This is not relevant to my thesis: maybe the Trust can be overruled or abrogated by stockholders.


I agree that the Trust has some oversight over the RSP:

[Anthropic commitments:]

Share results of ASL evaluations promptly with Anthropic's governing bodies, including the board of directors and LTBT, in order to sufficiently inform them of changes to our risk profile.

Responsible Scaling Officer. There is a designated member of staff responsible for ensuring that our Responsible Scaling Commitments are executed properly. Each quarter, they will share a report on implementation status to our board and LTBT, explicitly noting any deficiencies in implementation. They will also be responsible for sharing ad hoc updates sooner if there are any substantial implementation failures.

(This is nice but much less important than power over board seats.)

  1. Yes, they can.
  2. Last we heard, it's supposed to be at least 1/5 now and a majority by 2027.
  3. I believe Paul left the trust to join US AISI; I don't know of a source on this.

Minor remark, inspired (but not necessarily endorsed) by Buck:

The whole point of the Trust is to be able to act contrary to the interests of massively incentivized stakeholders. This is fundamentally a hard task, and it would be easy for the Trust Agreement to leave the Trust disempowered for practical purposes even if the people who wrote it weren't trying to sabotage it. And as we saw with OpenAI, it's dangerous to assume that the de facto power structures in an AI company match what's on paper.

(This post is about a sharper, narrower concern — that if you read the relevant document you'd immediately conclude that the Trust has no real power.)

Load More