Zach Stein-Perlman

AI strategy & governance. Blog: Not Optional.

Sequences

Slowing AI

Wiki Contributions

Load More

Comments

Thanks. Briefly:

I'm not sure what the theory of change for listing such questions is.

In the context of policy advocacy, think it's sometimes fine/good for labs to say somewhat different things publicly vs privately. Like, if I was in charge of a lab and believed (1) the EU AI Act will almost certainly pass and (2) it has some major bugs that make my life harder without safety benefits, I might publicly say "I support (the goals of) the EU AI Act" and privately put some effort into removing those bugs, which is technically lobbying to weaken the Act.

(^I'm not claiming that particular labs did ~this rather than actually lobby against the Act. I just think it's messy and regulation isn't a one-dimensional thing that you're for or against.)

This post is not trying to shame labs for failing to answer; I didn't try hard to get them to answer. (The period was one week but I wasn't expecting answers to my email / wouldn't expect to receive a reply even if I waited longer.)

(Separately, I kinda hope the answers to basic questions like this are already written down somewhere...)

Google sheet.

Some overall scores are one point higher. Probably because my site rounds down. Probably my site should round to the nearest integer...

Thanks for the feedback. I'll add "let people download all the data" to my todo list but likely won't get to it. I'll make a simple google sheet now.

This is too strong. For example, releasing the product would be correct if someone else would do something similar soon anyway and you're safer than them and releasing first lets you capture more of the free energy. (That's not the case here, but it's not as straightforward as you suggest, especially with your "Regardless of how good their alignment plans are" and your claim "There's just no good reason to do that, except short-term greed".)

Constellation (which I think has some important FHI-like virtues, although makes different tradeoffs and misses on others)

What is Constellation missing or what should it do? (Especially if you haven't already told the Constellation team this.)

Harry let himself be pulled, but as Hermione dragged him away, he said, raising his voice even louder, "It is entirely possible that in a thousand years, the fact that FHI was at Oxford will be the only reason anyone remembers Oxford!"

Yes but possibly the lab has its own private scaffolding which is better for its model than any other existing scaffolding, perhaps because it trained the model to use its specific scaffolding, and it can initially not allow users to use that.

(Maybe it’s impossible to give API access to scaffolding and keep the scaffolding private? Idk.)

Edit: Plus what David says.

Suppose you can take an action that decreases net P(everyone dying) but increases P(you yourself kill everyone), and leaves all else equal. I claim you should take it; everyone is better off if you take it.

I deny "deontological injunctions." I want you and everyone to take the actions that lead to the best outcomes, not that keep your own hands clean. I'm puzzled by your expectation that I'd endorse "deontological injunctions."

This situation seems identical to the trolley problem in the relevant ways. I think you should avoid letting people die, not just avoid killing people.

[Note: I roughly endorse heuristics like if you're contemplating crazy-sounding actions for strange-sounding reasons, you should suspect that you're confused about your situation or the effects of your actions, and you should be more cautious than your naive calculations suggest. But that's very different from deontology.]

I guess I'm more willing to treat Anthropic's marketing as not-representing-Anthropic.

Like, when OpenAI marketing says GPT-4 is our most aligned model yet! you could say this shows that OpenAI deeply misunderstands alignment but I tend to ignore it. Even mostly when Sam Altman says it himself.

[Edit after habryka's reply: my weak independent impression is that often the marketing people say stuff that the leadership and most technical staff disagree with, and if you use marketing-speak to substantially predict what-leadership-and-staff-believe you'll make worse predictions.]

Load More