Review
You are viewing a version of this post published on the . This link will always display the most recent version of the post.
habryka

Ok, so we both had some feelings about the recent Conjecture post on "lots of people in AI Alignment are lying", and the associated marketing campaign and stuff

I would appreciate some context in which I can think through that, and also to share info we have in the space that might help us figure out what's going on. 

I expect this will pretty quickly cause us to end up on some broader questions about how to do advocacy, how much the current social network around AI Alignment should coordinate as a group, how to balance advocacy with research, etc.

Olivia Jimenez

Feelings about Conjecture post:

  • Lots of good points about people not stating their full beliefs messing with the epistemic environment and making it costlier for others to be honest.
  • The lying and cowardice frames feel off to me. 
  • I personally used to have a very similar rant to Conjecture. Since moving to DC, I'm more sympathetic to governance people. We could try to tease out why.
  • The post exemplifies a longterm gripe I have with Conjecture's approach to discourse & advocacy, which I've found pretty lacking in cooperativeness and openness (Note: I worked there for ~half a year.) 

Questions on my mind:

  • How open should people motivated by existential risk be? (My shoulder model of several people says "take a portfolio approach!" - OK, then what allocation?)
  • How advocacy-y should people be? I want researchers to not have to tweet their beliefs 24/7 so they can actually get work done
  • How do you think about this, Oli?

How sympathetic to be about governance people not being open about key motivations and affiliations 

habryka

I personally used to have a very similar rant to Conjecture. I'm now more sympathetic to governance people.  We could try to tease out why.

This direction seems most interesting to me! 

My current feelings in the space are that I am quite sympathetic to some comms-concerns that people in government have and quite unsympathetic to some other stuff, and I would also like to clarify for myself where the lines here are.

Curious whether you have any key set of observations or experiences you had that made you more sympathetic.

Olivia Jimenez

Observations 

I've heard secondhand of at least one instance where a person brought up x risk, then their Congressional office took them less seriously. Other staffers have told me talking about x risk wouldn't play well (without citing specific evidence, but I take their opinions seriously).

  • (This didn't update me a ton though. My model already included "most people will think this is weird and take you less seriously". The question is, "Do you make it likelier for people to do good things later, all things considered by improving their beliefs, shifting the Overton window, or convincing 1/10 people, etc.?")

I've also personally found it tricky to talk about takeover & existential risks, just because these ideas take a long time to explain, and there are many inferential steps between there and the policies I'm recommending. So, I'm often tempted to mention my x risk motivations only briefly, then focus on whatever's inferentially closest and still true. (Classically, this would be "misuse risks, especially from foreign adversaries and terrorists" and "bioweapon and cyberoffensive capabilities coming in the next few years".) 

Olivia Jimenez

Separate point which we might want to discuss later

A thing I'm confused about is: 

Should I talk about inferentially close things that makes them likeliest to embrace the policies I'm putting on their desk,

Or, should I just bite the bullet of being confusing and start many meetings with "I'm deeply concerned about humanity going extinct in the next decade because of advancing AI which might try to take over the world. It's a lot to explain but the scientists are on my side. Please help." — where the thing I'm trying to emphasize is the tone of worry. 

Because I buy that we're systematically misleading people about how worried we are / they should be by not focusing on our actual concerns, and by not talking about them with a tone that conveys how worried we in fact are. 

(Additional explanation added post-dialogue: 

  • I was trying to differentiate between two issues with people not openly sharing & focusing on their existential risk worries: 
  • Issue 1 is that by not focusing on your existential risk worries, you're distorting people's sense of what you think is actually important and why. I think Habryka & Conjecture are correct to point out this is deceptive (I think in the sense of harmful epistemic effects, rather than unethical intentions). 
  • Issue 2, which I'm trying to get at above, is about the missing mood.  AI governance comms often goes something like "AI has immense potential, but also immense risks. AI might be misused by China, or get of control. We should balance the needs for innovation and safety." I wouldn't call this lying (though I agree it can have misleading effects, see Issue 1). The thing I would emphasize is that it doesn't sound like someone who thinks we all might die. It doesn't convey "AI advancing is deeply scary. Handling this might require urgent, extraordinarily difficult and unprecedented action." As such, I suspect it's not causing people to take the issue seriously enough to do the major governance interventions we might need. Sure, we might be getting government officials to mention catastrophic risk in their press releases, but do they really take 10% p(doom) seriously? If not, our comms seems insufficient.  
  • Of course, I doubt we should always use the "deeply concerned" tone. It depends on what we're trying to do. I'm guessing the question is how much are we trying to get policies through now vs. trying to get people to take the issue as seriously as us? Also, I admit it's even more complicated because sometimes sounding grave gets you laughed out of rooms instead of taken seriously, different audiences have different needs, etc.)
habryka

So, I think for me, I feel totally sympathetic to people finding it hard and often not worth it to explain their x-risk concerns, if they are talking to people who don't really have a handle for that kind of work yet. 

Like, I have that experience all the time as well, where I work with various contractors, or do fundraising, and I try my best to explain what my work is about, but it does sure often end up rounded off to some random preconception they have (sometimes that's standard AI ethics sometimes that's cognitive science and psychology, sometimes that's people thinking I run a web-development startup that's trying to maximize engagement metrics).

The thing that I am much less sympathetic to is people being like "please don't talk about my connections to this EA/X-Risk ecosystem, please don't talk about my beliefs in this space to other people, please don't list me as having been involved with anything in this space publicly, etc."

Or like, the thing with Jason Matheny that I mentioned in a comment the other day where the senator that was asking him a question had already mentioned "apocalyptic risks from AI" and was asking Matheny how likely and how soon that would happen, and Matheny just responded with "I don't know". 

Those to me don't read as people having trouble crossing an inferential distance. They read to me more as them trying to actually be strategically deceptive about their beliefs and affiliations here, and that feels much more like it's crossing lines.

Olivia Jimenez

Regarding "please don't talk about my connections or beliefs" ––

I'm not sure how bad I feel about this. I usually like openness. But I'm also usually fine with people being very private, but then answering questions honestly. E.g. it feels weird to write online about two people dating when they'd prefer that info is private. I'm trying to sort out what the difference between that and EA-affiliation is now....

Olivia Jimenez

OK maybe it's the strategically deceptive difference. I take that point. 

(And to your point, during my time in AI policy, there have been a few instances where people explicitly asked me to downplay how well we knew each other, or said they would downplay this if they were asked. This was pretty personally frustrating, because now I would incur social points / be seen as defecting for being open. In my opinion, they were ~defecting by putting me in this position.)

A thing I might still be confused about is — journalists can be adversarial. It seems maybe reasonable to conceal stuff when you know info will be misunderstood if not intentionally distorted. Thoughts? 

Olivia Jimenez

(Also, I'm sympathetic for concealing governance-related plans for the reason of "we don't want to generate a bunch of pushback too soon".)

habryka

Yeah, this is definitely the kind of thing that seems quite bad to me, and I am also really worried that it is the kind of thing that as part of its course masks the problems it causes (like, we wouldn't really know if this is causing tons of problems, because people are naturally incentivized to cover the problems up that are caused by it, and because making accusations of this kind of conspiratorial action is really high-stakes and hard), and so if it goes wrong, it will probably go wrong quickly and with a lot of pent-up energy.

I do totally think there are circumstances where I will hide things from other people with my friends, and be quite strategic about it. The classical examples are discussing problems with some kind of authoritarian regime while you are under it (Soviet Russia, Nazi Germany, etc.), and things like "being gay" where society seems kind of transparently unreasonable about it, and in those situations I am opting into other people getting to impose this kind of secrecy and obfuscation request on me. I also feel kind of similar about people being polyamorous, which is pretty relevant to my life, since that still has a pretty huge amount of stigma attached to it.

I do think I experienced a huge shift after the collapse of FTX where I was like "Ok, but after you caused the biggest fraud since Enron, you really lost your conspiracy license. Like, 'the people' (broadly construed) now have a very good reason for wanting to know the social relationships you have, because the last time they didn't pay attention to this it turns out to have been one of the biggest conspiracies of the last decade". 

I care about this particularly much because indeed FTX/Sam was actually really very successful at pushing through regulations and causing governance change, but as far as I can tell he was primarily acting with an interest in regulatory capture (having talked to a bunch of people who seem quite well-informed in this space, it seems that he was less than a year away from basically getting a US sponsored monopoly on derivatives trading in the US via regulatory capture). And like, I can't distinguish the methods he was using from the methods other EAs are using right now (though I do notice some difference).

Olivia Jimenez

Yeah, I was going to say

I imagine the government people might argue they're sort of in the Nazi situation where being EA affiliated has a bunch of stigma. 

But given FTX and such, some stigma seems deserved. (Sigh about the regulatory capture part — I wasn't aware of the extent.)

habryka

For reference, the relevant regulation to look up is the Digitial Commodities Consumer Protection Act

Feelings & concerns about governance work by EAs

Olivia Jimenez

I'm curious for you to list some things EA governance people are doing that you think are bad or fine/understandable, so I can see if I disagree with any.

habryka

I do think that at a high level, the biggest effect I am tracking from the governance people is that in the last 5 years or so, they were usually the loudest voices that tried to get people to talk less about existential risk publicly, and to stay out of the media, and to not reach out to high-stakes people in various places, because they were worried that doing so would make us look like clowns and would poison the well.

And then one of my current stories is that at some point, mostly after FTX when people were fed up with listening to some vague EA conservative consensus, a bunch of people started ignoring that advice and finally started saying things publicly (like the FLI letter, Eliezer's time piece, the CAIS letter, Ian Hogarth's piece). And then that's the thing that's actually been moving things in the policy space. 

My guess is we maybe could have also done that at least a year earlier, and honestly I think given the traction we had in 2015 on a lot of this stuff, with Bill Gates and Elon Musk and Demis, I think there is a decent chance we could have also done a lot of Overton window shifting back then, and us not having done so is I think downstream of a strategy that wanted to maintain lots of social capital with the AI capability companies and random people in governments who would be weirded out by people saying things outside of the Overton window.

Though again, this is just one story, and I also have other stories where it all depended on Chat-GPT and GPT-4 and before then you would have been laughed out of the room if you had brought up any of this stuff (though I do really think the 2015 Superintelligence stuff is decent evidence against that). It's also plausible to me that you need a balance of inside and outside game stuff, and that we've struck a decent balance, and that yeah, having inside and outside game means there will be conflict between the different people involved in the different games, but it's ultimately the right call in the end.

Olivia Jimenez

Introspecting on my feelings about various maybe-suspicious governance things:

  • Explicitly downplaying one's worries about x risk, a la your read of Matheny's "I don't know" comment: bad
  • Explicitly downplaying one's connections to EA, a la UK people saying "if someone asks me how I know Olivia, I'll downplay our close friendship": bad
  • Asking everyone to be quiet and avoid talking to media to avoid well-poisoning: bad
    • Asking everyone to be quiet bc of fears of the government accelerating AI capabilities: I don't know, leans good
  • Asking people not to mention your EA affiliation and carefully avoiding mentioning it yourself: I don't know, depends 
    • Why this might be okay/good: EA has undeserved stigma, and it seems good for AI policy to become increasingly separate from EA
    • Why this might be bad: Maybe after FTX, EA-affiliated people should take it upon themselves to be very open
    • Current reconciliation: Be honesty about your affiliations if asked and don't ask others to hide them, but feel free to purposefully distance yourself from EA and avoid bringing it up
  • Not tweeting about or mentioning x-risk in meetings where inferential gap is too big: I don't know, leans fine 
    • I'm conflicted between "We might be able to make the Overton window much wider and improve everyone's sanity by all being very open and explicit about this often" and "It's locally pretty burdensome to frequently share your views to people without context"
habryka

The second biggest thing I am tracking is a kind of irrecoverable loss of trust that I am worried will happen between "us" and "the public", or something in that space. 

Like, a big problem with doing this kind of information management where you try to hide your connections and affiliations is that it's really hard for people to come to trust you again afterwards. If you get caught doing this, it's extremely hard to rebuild trust that you aren't doing this in the future, and I think this dynamic usually results in some pretty intense immune reactions when people fully catch up with what is happening.

Like, I am quite worried that we will end up with some McCarthy-esque immune reaction to EA people in the US and the UK government where people will be like "wait, what the fuck, how did it happen that this weirdly intense social group with strong shared ideology is now suddenly having such an enormous amount of power in government? Wow, I need to kill this thing with fire, because I don't even know how to track where it is, or who is involved, so paranoia is really the only option". 

Olivia Jimenez

On "CAIS/Ian/etc. finally just said it in public" 

I think it's likely the governance folk wouldn't have done this themselves at that time, had no one else done it. So, I'm glad CAIS did. 

I'm not totally convinced we could have done it pre-ChatGPT without the blowback people feared. I've definitely updated a bit given how great the government response has been (considering the evidence of risk is limited and people have been pretty open about their fears) and how not-limited we are by "this sounds crazy and unjustified" (relative to my expectations).

On public concern,

I agree this is possible and worrying. 

I'm also moderately worried about making the AI convo so crowded, by engaging with the public a lot, that x-risk doesn't get dealt with. I'm at least sympathetic to "don't involve a bunch of people in your project who you don't actually expect to contribute, just because they might be unhappy if they're not included".   

Stigmas around EA in the policy world

habryka

I'm conflicted between "EA has undeserved stigma" and "after FTX, everyone should take it upon themselves to be very open" 

I am kind of interested in talking about this a bit. I feel like it's a thing I've heard a lot, and I guess I don't super buy it. What is the undeserved stigma that EA is supposed to have? 

Olivia Jimenez

Is your claim that EA's stigma is all deserved? 

Laying out the stigmas I notice: 

  • Weird beliefs lead to corruption, see FTX 
    • Probably exaggerates the extent of connection each individual had to FTX. But insofar as FTX's decisions were related EA culture, fair enough. 
  • Have conflicts with the labs, see OpenAI & Anthropic affiliations
    • Fair enough
  • Elite out-of-touch people
    • Sorta true (mostly white, wealthy, coastal elites), sorta off (lots of people didn't come from wealth, and they got into this space because they wanted to be as effectively altruistic as possible)
  • Billionaires selfish interests 
    • Seems wrong; we're mostly trying to help people rather than make money
  • Weird longtermist techno-optimists who don't care about people 
    • Weird longtermists, yup. 
    • Techno-optimistics, kinda, but some of us are pretty pessimistic about AI and using AI to solve that problem. 
    • Don't care about people, mostly wrong. 

I guess the stigmas seem pretty directionally true about the negatives, and just miss that there is serious thought / positives here. 

If journalists said, "Matheny is EA-affiliated. That suggests he's related to the community that's thought most deeply about catastrophic AI risk and has historically tried hard and succeeded at doing lots of good. It should also raise some eyebrows, see FTX," then I'd be fine. But usually it's just the second half, and that's why I'm sympathetic to people avoiding discussing their affiliation. 

habryka

Yeah, I like this analysis, and I think it roughly tracks how I am thinking about it.

I do think the bar for "your concerns about me are so unreasonable that I am going to actively obfuscate any markers of myself that might trigger those concerns" is quite high. Like I think the bar can't be at "well, I thought about these concerns and they are not true", it has to be at "I am seriously concerned that when the flags trigger you will do something quite unreasonable", like they are with the gayness and the communism-dissenter stuff.

Olivia Jimenez

Fair enough. This might be a case of governance people overestimating honesty costs / underestimating benefits, which I still think they often directionally do. 

(I'll also note, what if all the high profile people tried defending EA? (Defending in the sense of - laying out the "Here are the good things; here are the bad things; here's how seriously I think you should take them, all things considered."))

habryka

I don't think people even have to defend EA or something. I think there are a lot of people who I think justifiably would like to distance themselves from that identity and social network because they have genuine concerns about it.

But I think a defense would definitely open the door for a conversation that acknowledges that of course there is a real thing here that has a lot of power and influence, and would invite people tracking the structure of that thing and what it might do in the future, and if that happens I am much less concerned about both the negative epistemic effects and the downside risk from this all exploding in my face.

Olivia Jimenez

Yeah, I'd be interested in part because I want the skeptical-of-EA folk to know that I don't think they're crazy and I'm not trying to ignore them or lie to them. I'm just opting to keep doing things with/near EA because they think well and they have resources, which is helping me with my altruistic goals. If the skeptics are like "OK, I still distrust you", fair enough. 

Do you have ideas for ways to make the thing you want here happen? What does it look like? An op-ed from Person X? 

How can we make policy stuff more transparent?

habryka

Probably somewhat controversially, but I've been kind of happy about the Politico pieces that have been published. We had two that basically tried to make the case there is an EA conspiracy in DC that has lots of power in a kind of unaccountable way.

Maybe someone could reach out to the author and be like "Ok, yeah, we are kind of a bit conspiratorial, sorry about that. But I think let's try to come clean, I will tell you all the stuff that I know, and you take seriously the hypothesis that we really aren't doing this to profit off of AI, but because we are genuinely concerned about catastrophic risks from AI". 

That does seem like kind of a doomed plan, but like, something in the space feels good to me. Maybe we can work with some journalists we know to write a thing that puts the cards on the table, and isn't just a puff-piece that tries to frame everything in the most positive way, but is genuinely asking hard questions.

Olivia Jimenez

Politico: +1 on being glad it came out actually!

  • (I tentatively wish people had just said this themselves first instead of it being "found out". Possibly this is part of how I'll make the case to people I'm asking to be more open in the future.) 
  • (Also, part of my gladness with Politico is the more that comes out, the more governance people can evaluate how much this actually blocks their work -- so far, I think very little -- and update towards being more open or just be more open now because now their affiliations have been revealed)  

I like the idea of reaching out to journalists. I'd just want to find one who seems truth-seeking, and share info unilaterally. 

Could you do this? Would you want some bigger name EA / governance person to do it? I'd be happy to chip into a draft.  

habryka

I think I have probably sadly burdened myself with somewhat too much confidentiality to dance this dance correctly, though I am not sure. I might be able to get buy-in from a bunch of people so that I can be free to speak openly here, but it would increase the amount of work a lot, and also decent chance they don't say yes and then I would need to be super paranoid which bits I leak when working on this.

Olivia Jimenez

As in, you've agreed to keep too much secret?

If so, do you have people in mind who aren't as burdened by this (and who have the relevant context)? 

habryka

Yeah, too many secrets.

Olivia Jimenez

I assume most of the big names have similar confidentiality burdens. 

I'm open to doing it myself, but don't have a lot of the context and I'm not sure I want to drag the organizations I'm affiliated with into this. 

habryka

Yeah, ideally it would be one of the big names since that I think would meaningfully cause a shift in how people operate in the space.

Eliezer is great at moving Overton windows like this, but I think he is really uninterested in tracking detailed social dynamics like this, and so doesn't really know what's up.

Olivia Jimenez

Do you have time to have some chats with people about the idea or send around a Google doc? Happy to help however.

habryka

I do feel quite excited about making this happen, though I do think it will be pretty aggressively shut down, and I feel both sad about that, and also have some sympathy in that it does feel like it somewhat inevitably involves catching some people in the cross-fire who were being more private for good reasons, or who are in a more adversarial context where the information here will be used against them in an unfair way, and I still think it's worth it, but it does make me feel like this will be quite hard.

I also notice that I am just afraid of what would happen if I were to e.g. write a post that's just like "an overview over the EA-ish/X-risk-ish policy landscape" that names specific people and explains various historical plans. Like I expect it would make me a lot of enemies.

Olivia Jimenez

Same, and some of my fear is "this could unduly make the 'good plans' success much harder"

habryka

Ok, I think I will sit on this plan for a bit. I hadn't really considered it before, and I kind of want to bring it up to a bunch of people in the next few weeks and see whether maybe there is enough support for this to make it happen.

Olivia Jimenez

Nice! (For what it's worth, I currently like Eliezer most, if he was willing to get into the social stuff)

Any info it'd be helpful for me to collect from DC folk?

habryka

Oh, I mean I would love any more data on how much this would make DC folk feel like some burden was lifted from their shoulders vs. it would feel like it would just fuck with their plans. 

I think my actual plan here would maybe be more like an EA Forum post or something that just goes into a lot of detail on what is going on in DC, and isn't afraid to name specific names or organizations. 

I can imagine directly going for an Op-ed could also work quite well, and would probably be more convincing to outsiders, though maybe ideally you could have both. Where someone writes the forum post on the inside, and then some external party verifies a bunch of the stuff, and digs a bit deeper, and then makes some critiques on the basis of that post, and then the veil is broken.

Olivia Jimenez

Got it. 

Would the DC post include info that these people have asked/would ask you to keep secret? 

habryka

Definitely "would", though if I did this I would want to sign-post that I am planning to do this quite clearly to anyone I talk to. 

I am also burdened with some secrets here, though not that many, and I might be able to free myself from those burdens somehow. Not sure.

Olivia Jimenez

Ok I shall ask around in the next 2 weeks. Ping me if I don't send an update by then

habryka

Thank you!!

Concerns about Conjecture

habryka

Ok, going back a bit to the top-level, I think I would still want to summarize my feelings on the Conjecture thing a bit more.

Like, I guess the thing that I would feel bad about if I didn't say it in a context like this, is to be like "but man, I feel like some of the Conjecture people were like at the top of my list of people trying to do weird epistemically distortive inside-game stuff a few months ago, and this makes them calling out people like this feel quite bad to me".

In-general a huge component of my reaction to that post was something in the space of "Connor and Gabe are kind of on my list to track as people I feel most sketched out by in a bunch of different ways, and kind of in the ways the post complaints about" and I feel somewhat bad for having dropped my efforts from a few months ago about doing some more investigation here and writing up my concerns (mostly because I was kind of hoping a bit that Conjecture would just implode as it ran out of funding and maybe the problem would go away)

Olivia Jimenez

(For what it's worth, Conjecture has been pretty outside-game-y in my experience. My guess is this is mostly a matter of "they think outside game is the best tactic, given what others are doing and their resources", but they've also expressed ethical concerns with the inside game approach.)

habryka

(For some context on this, Conjecture tried really pretty hard a few months ago to get a bunch of the OpenAI critical comments on this post deleted because they said it would make them look bad to OpenAI and would antagonize people at labs in an unfair way and would mess with their inside-game plans that they assured me were going very well at the time)

Olivia Jimenez

(I heard a somewhat different story about this from them, but sure, I still take it as is evidence that they're mostly "doing whatever's locally tactical")

Olivia Jimenez

Anyway, I was similarly disappointed by the post just given I think Conjecture has often been lower integrity and less cooperative than others in/around the community. For instance, from what I can tell, 

  • They often do things of the form "leaving out info, knowing this has misleading effects"
  • One of their reasons for being adversarial is "when you put people on the defense, they say more of their beliefs in public". Relatedly, they're into conflict theory, which leads them to favor "fight for power" > "convince people with your good arguments." 

I have a doc detailing my observations that I'm open to sharing privately, if people DM me.  

(I discussed these concerns with Conjecture at length before leaving. They gave me substantial space to voice these concerns, which I'm appreciative of, and I did leave our conversations feeling like I understood their POV much better. I'm not going to get into "where I'm sympathetic with Conjecture" here, but I'm often sympathetic. I can't say I ever felt like my concerns were resolved, though.) 

I would be interested in your concerns being written up. 

I do worry about the EA x Conjecture relationship just being increasingly divisive and time-suck-y. 

habryka

Here is an email I sent Eliezer on April 2nd this year with one paragraph removed for confidentiality reasons:


Hey Eliezer,

This is just an FYI and I don't think you should hugely update on this but I felt like I should let you know that I have had some kind of concerning experiences with a bunch of Conjecture people that currently make me hesitant to interface with them very much and make me think they are somewhat systematically misleading or deceptive. A concrete list of examples:

I had someone reach out to me with the following quote:
 

Mainly, I asked one of their senior people how they plan to make money because they have a lot of random investors, and he basically said there was no plan, AGI was so near that everyone would either be dead or the investors would no longer care by the time anyone noticed they weren’t seeming to make money. This seems misleading either to the investors or to me — I suspect me, because it would really just be wild if they had no plan to ever try to make money, and in fact they do actually have a product (though it seems to just be Whisper repackaged)

I separately had a very weird experience with them on the Long Term Future Fund where Conor Leahy applied for funding for Eleuther AI. We told him we didn't want to fund Eleuther AI since it sure mostly seemed like capabilities-research but we would be pretty interested in funding AI Alignment research by some of the same people.

He then confusingly went around to a lot of people around EleutherAI and told them that "Open Phil is not interested in funding pre-paradigmatic AI Alignment research and that that is the reason why they didn't fund Eleuther AI".

This was doubly confusing and misleading because Open Phil had never evaluated a grant to Eleuther AI (Asya who works at Open Phil was involved in the grant evaluation as a fund member, but nothing else), and of course the reason he cited had nothing to do with the reason we actually gave. He seems to have kept saying this for a long time even after I think someone explicitly corrected the statement to him.

Another experience I had was Gabe from Conjecture reaching out to LessWrong and trying really quite hard to get us to delete the OpenAI critical comments on this post: https://www.lesswrong.com/posts/3S4nyoNEEuvNsbXt8/common-misconceptions-about-openai

He said he thought people in-general shouldn't criticize OpenAI in public like this because this makes diplomatic relationships much harder, and when Ruby told them we don't delete that kind of criticism he escalated to me and generally tried pretty hard to get me to delete things.

[... One additional thing that's a bit more confidential but of similar nature here...]

None of these are super bad but they give me an overall sense of wanting to keep a bunch of distance from Conjecture, and trepidation about them becoming something like a major public representative of AI Alignment stuff. When I talked to employees of Conjecture about these concern the responses I got also didn't tend to be "oh, no, that's totally out of character", but more like "yeah, I do think there is a lot of naive consequentialism here and I would like your help fixing that".

No response required, happy to answer any follow-up questions. Just figured I would err more on the side of sharing things like this post-FTX.

Best,
Oliver

Olivia Jimenez

I wish MIRI was a little more loudly active, since I think doomy people who are increasingly distrustful of moderate EA want another path, and supporting Conjecture seems pretty attractive from a distance. 

Again, I'm not sure "dealing with Conjecture" is worth the time though. 

Olivia Jimenez

Main emotional affects of the post for me

  • I wish someone else had made these points, less adversarially. I feel like governance people do need to hear them. But the frame might make people less likely to engage or make the engagement. 
    • Actually, I will admit the post generated lots of engagement in comments and this discussion. It feels uncooperative to solicit engagement via being adversarial though. 
  • I'm disappointed the comments were mostly "Ugh Conjecture is being adversarial" and less about "Should people be more publicly upfront about how worried they are about AI?" 
  • There were several other community discussions in the past few weeks that I'll tentatively call "heated community politics", and I'm feel overall bad about the pattern. 
    • (The other discussions were around whether RSPs are bad and whether Nate Soares is bad. In all three cases, I felt like those saying "bad!" had great points, but (a) their points were shrouded in frames of "is this immoral" that felt very off to me, (b) they felt overconfident and not truth-seeking, and (c) I felt like people were half-dealing with personal grudges. This all felt antithetical to parts of LessWrong and EA culture I love.)  
habryka

Yeah, that also roughly matches my emotional reaction. I did like the other RSP discussion that happened that week (and liked my dialogue with Ryan which I thought was pretty productive).

Conjecture as the flag for doomers

habryka

I wish MIRI was a little more loudly active, since I think doomy people who are increasingly distrustful of moderate EA want another path, and supporting Conjecture seems pretty attractive from a distance. 

Yeah, I share this feeling. I am quite glad MIRI is writing more, but am also definitely worried that somehow Conjecture has positioned itself as being aligned with MIRI in a way that makes me concerned people will end up feeling deceived.

Olivia Jimenez

Two thoughts

  • Conjecture does seem pretty aligned with MIRI in "shut it all down" and "alignment hard" (plus more specific models that lead there).
  • I notice MIRI isn't quite a satisfying place to rally around, since MIRI doesn't have suggestions for what individuals can do. Conjecture does.  
Olivia Jimenez

Can you say more about the feeling deceived worry?

(I didn't feel deceived having joined myself, but maybe "Conjecture could've managed my expectations about the work better" and "I wish the EAs with concerns told me so more explicitly instead of giving very vague warnings".)

habryka

Well, for better or for worse I think a lot of people seem to make decisions on the basis of "is this thing a community-sanctioned 'good thing to do (TM)'". I think this way of making decisions is pretty sus, and I feel a bit confused how much I want to take responsibility for people making decisions this way, but I think because Conjecture and MIRI look similar in a bunch of ways, and I think Conjecture is kind of explicitly is trying to carry the "doomer" flag, a lot of people will parse Conjecture as "a community-sanctioned 'good thing to do (TM)'".

I think this kind of thing then tends to fail in one of two ways: 

  • The person who engaged more with Conjecture realizes that Conjecture is much more controversial than they realized within the status hierarchy of the community and that it's not actually clearly a 'good thing to do (TM)', and then they will feel betrayed by Conjecture for hiding that from them and betrayed by others by not sharing their concerns with them
  • The person who engaged much more with Conjecture realizes that the organization hasn't really internalized the virtues that they associate with getting community approval, and then they will feel unsafe and like the community is kind of fake in how it claims to have certain virtues but doesn't actually follow them in the projects that have "official community approval"

Both make me pretty sad.

Also, even if you are following a less dumb decision-making structure, the world is just really complicated, and especially with tons of people doing hard-to-track behind the scenes work, it is just really hard to figure out who is doing real work or not, and Conjecture has been endorsed by a bunch of different parts of the community for-real (like they received millions of dollars in Jaan funding, for example, IIRC), and I would really like to improve the signal to noise ratio here, and somehow improve the degree to which people's endorsements accurately track whether a thing will be good.

Olivia Jimenez

Fair. People did warn me before I joined Conjecture (but it didn't feel very different from warnings I might get before working at MIRI). Also, most people I know in the community are aware Conjecture has a poor reputation. 

I'd support and am open to writing a Conjecture post explaining the particulars of 

  • Experiences that make me question their integrity
  • Things I wish I knew before joining
  • My thoughts of their lying post and RSP campaign (tl;dr: important truth to the content, but really dislike the adversarial frame)
habryka

Well, maybe this dialogue will help, if we edit and publish a bunch of it.

New Comment
57 comments, sorted by Click to highlight new comments since:

A couple of quick, loosely-related thoughts:

  • I think the heuristic "people take AI risk seriously in proportion to how seriously they take AGI" is a very good one. There are some people who buck the trend (e.g. Stuart Russell, some accelerationists), but it seems broadly true (e.g. Hinton and Bengio started caring more about AI risk after taking AGI more seriously). This should push us towards thinking that the current wave of regulatory interest wasn't feasible until after ChatGPT.
  • I think that DC people were slower/more cautious about pushing the Overton Window after ChatGPT than they should have been. I think they should update harder from this mistake than they're currently doing (e.g. updating that they're too biased towards conventionality). There probably should be at least one person with solid DC credentials who's the public rallying point for "I am seriously worried about AI takeover". 
  • I think that "doomers" were far too pessimistic about governance before ChatGPT (in ways that I and others predicted beforehand, e.g. in discussions with Ben and Eliezer). I think they should update harder from this mistake than they're currently doing (e.g. updating that they're too biased towards inside-view models and/or fast takeoff and/or high P(doom)).
  • I think that the AI safety community in general (including myself) was too pessimistic about OpenAI's strategy of gradually releasing models (COI: I work at OpenAI), and should update more on that mistake.
  • I think there's a big structural asymmetry where it's hard to see the ways in which DC people are contributing to big wins (like AI executive orders), and they can't talk about it, and the value of this work (and the tradeoffs they make as part of that) is therefore underestimated.
  • "One of the biggest conspiracies of the last decade" doesn't seem right. The amount of money/influence involved in FTX is dwarfed by the amount of money/influence thrown around by governments in general, and it's easier for factions within governments to enforce secrecy than for corporations to do so. More concretely, I'd say that there were probably several different "conspiratorial" things related to covid in various countries that had much bigger effects; probably several more related to ongoing Russia-Ukraine and Israel-Palestine conflicts; probably several more Trump/Biden-related things; maybe some to do with culture-war stuff; probably a few more prosaic fraud or corruption things that stole tens of billions of dollars, just less publicly (e.g. from big government contracts); a bunch of criminal gangs which also have far more money than FTX did; and almost certainly a bunch that don't fall into any of those categories. (For example, if the CIA is currently doing any stuff comparable to its historical record of messing around with South American countries, that's plausibly far bigger than FTX. Or various NSA surveillance type things are likely a much bigger deal, in terms of impact, than FTX. Oh, and stuff like NotPetya should probably count too.)
  • There's at least one case where I hesitated to express my true beliefs publicly because I was picturing Conjecture putting the quote up on the side of a truck. I don't know how much I endorse this hesitation, but it's definitely playing some role in my decisions, and I expect will continue to do so.
  • I think that "doomers" were far too pessimistic about governance before ChatGPT (in ways that I and others predicted beforehand, e.g. in discussions with Ben and Eliezer). I think they should update harder from this mistake than they're currently doing (e.g. updating that they're too biased towards inside-view models and/or fast takeoff and/or high P(doom)).

I think it remains to be seen what the right level of pessimism was. It still seems pretty likely that we'll see not just useless, but actively catastrophically counter-productive interventions from governments in the next handful of years.

But you're absolutely right that I was generally pessimistic about policy interventions from 2018ish through to 2021 or so. 

My main objection was that I wasn't aware of any policies that seemed like they helped and I was unenthusiastic about the way that EAs seemed to be optimistic about getting into positions of power without (seeming to me) to be very clear-to-themselves that they didn't have policy ideas to implement. 

I felt better about people going into policy to the extent that those people had clarity for themselves, "I don't know what to recommend if I have power. I'm trying to execute one part of a two part plan that involves getting power and then using that to advocate for x-risk mitigating policies. I'm intentionally punting that question to my future self / hoping that other EAs thinking full time about this come up with good ideas." I think I still basically stand by this take. [1]

My main update is it turns out that the basic idea of this post was false. There were developments that were more alarming than "this is business as usual" to a good number of people and that really changed the landscape. 

One procedural update that I've made from that and similar mistakes is just "I shouldn't put as much trust in Eliezer's rhetoric about how the world works, when it isn't backed up by clearly articulated models. I should treat those ideas a plausible hypotheses, and mostly be much more attentive to evidence that I can see directly."  

 

  1. ^

    Also, I think that this is one instance of the general EA failure mode of pursuing a plan which entails accruing more resources for EA (community building to bring in more people, marketing to bring in more money, politics to acquire power), without a clear personal inside view of what to do with those resources, effectively putting a ton of trust in the EA network to reach correct conclusions about which things help.

    There are a bunch of people trusting the EA machine to 1) aim for good things and 2) have good epistemics. They trust it so much they'll go campaign for a guy running for political office without knowing much about him, except that he's an EA. Or they route their plan for positive impact on the world through positively impacting EA itself ("I want to do mental health coaching for EAs" or "I want to build tools for EAs" or going to do ops for this AI org, which 80k recommended (despite not knowing much about what they do).)

    This is pretty scary, because it seems like a some of those people were not worthy of trust (SBF in particular, won a huge amount of veneration). 

    And even in the cases the people who are, I believe, earnest geniuses, it is still pretty dangerous to mostly be deferring to them. Paul put a good deal of thought into the impacts of developing RLHF, and he thinks the overall impacts are positive. But that Paul is smart and good does not make it a foregone conclusion that his work is good not net. That's a really hard question to answer, about which I think most people should be pretty circumspect. 

    It seems to me that there is an army of earnest young people who want to do the most good that they can. They've been told (and believe) the AI risk is the most important problem, but it's a confusing problem depending on technical expertise, famously fraught problems of forecasting the character of not-yet-existent technologies, and a bunch of weird philosophy. The vast majority of those young people don't know how to make progress on the core problems of AI risk directly, or even necessarily identify which work is making progress. But they still want to help, so they commit themselves to eg community building, getting more people to join, everyone taking social cues from the few people that seem to have personal traction on the problem about what kinds of object level things are good to do. 

    This seems concerning to me. This kind of structure where a bunch of smart young people are building a pile of resources to be controlled mostly by deference to a status hierarchy, where you figure out which thinkers are cool by picking up on the social cues of who is regarded as cool, rather than evaluating their work for yourself...well, it's not so much that I expect it to be coopted, but I just don't expect that overall agglomerated machine to be particularly steered towards the good, whatever values it professes. 

    It doesn't have a structure that binds it particularly tightly to what's true. Better than most non-profit communities, worse than many for-profit companies, probably.

    It seems more concerning to the extent that many of the object level actions to which the EAs are funneling resources are not just useless, but actively bad. It turns out that being smart enough, as a community, to identify the most important problem in the world, but not smart enough to systematically know how to positively impact that problem is pretty dangerous.

    eg the core impacts of people trying to impact x-risk so far includ

    - (Maybe? Partially?) causing Deepmind to exist

    - (Definitely) causing OpenAI to exist

    - (Definitely) causing Anthropic to exist

    - Inventing RLHF and accelerating the development of RLHF'd language models

    It's pretty unclear to me what the sign of these interventions are. They seem bad on the face of it, but as I've watched things develop I'm not as sure. It depends on pretty complicated questions about second and third order effects, and counterfactuals.

    But it seems bad to have an army of earnest young people who, in the name of their do-gooding ideology, shovel resources at the decentralized machine doing these maybe good maybe bad activities, because they're picking up on social cues of who to defer to and what those people think! That doesn't seem very high EV for the world!

    (To be clear, I was one of the army of earnest young people. I spent a number of years helping recruit for a secret research program—I didn't even have the most basic information, much less the expertise to assess if it was any good—because I was taking my cues from Anna, who was taking her cues from Eliezer. 

    I did that out of a combination of 1) having read Eliezer's philosophy, and having enough philosophical grounding to be really impressed by it, and 2) being ready and willing to buy into a heroic narrative to save the world, which these people were (earnestly) offering me.)

    And, procedurally, all this is made somewhat more perverse, by the fact that that this community, this movement, was branded as the "carefully think through our do gooding" movement. We raised the flag of "let's do careful research and cost benefit analysis to guide our charity", but over time this collapsed into a deferral network, with ideas about what's good to do driven mostly by the status hierarchy. Cruel irony.

     

Well said. I agree with all of these except the last one and the gradual model release one (I think the update should be that letting the public interact with models is great, but whether to do it gradually or in a 'lumpy' way is unclear. E.g. arguably ChatGPT3.5 should have been delayed until 2023 alongside GPT4. That would have pushed back the acceleration of e.g. GDM a few more months, without (IMO) any harm to public wake-up.)

I especially want to reemphasize your point 2.

E.g. arguably ChatGPT3.5 should have been delayed until 2023 alongside GPT4. That would have pushed back the acceleration of e.g. GDM a few more months, without (IMO) any harm to public wake-up.)

That would have pushed back public wakeup equally though, because it was ChatGPT3.5 that caused the wakeup.

Did anyone at OpenAI explicitly say that a factor in their release cadence was getting the public to wake up about the pace of AI research and start demanding regulation? Because this seems more like a post hoc rationalization for the release policy than like an actual intended outcome.

See Sam Altman here:

As we create successively more powerful systems, we want to deploy them and gain experience with operating them in the real world. We believe this is the best way to carefully steward AGI into existence—a gradual transition to a world with AGI is better than a sudden one. We expect powerful AI to make the rate of progress in the world much faster, and we think it’s better to adjust to this incrementally.

A gradual transition gives people, policymakers, and institutions time to understand what’s happening, personally experience the benefits and downsides of these systems, adapt our economy, and to put regulation in place. It also allows for society and AI to co-evolve, and for people collectively to figure out what they want while the stakes are relatively low.

And Sam has been pretty vocal in pushing for regulation in general.

It would have pushed it back, but then the extra shock of going straight to ChatGPT4 would have made up for it I think. Not sure obviously.

then chatgpt4 would still have had low rate limits, so most people would still be more informed by ChatGPT3.5

"One of the biggest conspiracies of the last decade" doesn't seem right. The amount of money/influence involved in FTX is dwarfed by the amount of money/influence thrown around by governments in general, and it's easier for factions within governments to enforce secrecy than for corporations to do so. More concretely, I'd say that there were probably several different "conspiratorial" things related to covid in various countries that had much bigger effects; probably several more related to ongoing Russia-Ukraine and Israel-Palestine conflicts; probably several more Trump/Biden-related things; maybe some to do with culture-war stuff; probably a few more prosaic fraud or corruption things that stole tens of billions of dollars, just less publicly (e.g. from big government contracts); a bunch of criminal gangs which also have far more money than FTX did; and almost certainly a bunch that don't fall into any of those categories. (For example, if the CIA is currently doing any stuff comparable to its historical record of messing around with South American countries, that's plausibly far bigger than FTX. Or various NSA surveillance type things are likely a much bigger deal, in terms of impact, than FTX. Oh, and stuff like NotPetya should probably count too.)

There are few programs even within the U.S. government that are larger than $10B without very extensive reporting requirements and where it's quite hard for them to be conspiratorial in the relevant ways (they might be ineffective, or the result of various bad equilibria, but I don't think you regularly get conspiracies at this scale).

To calibrate people here, the total budget of the NSA appears to be just around $10B/yr, making it so that even if you classify the whole thing as a conspiracy, at least in terms of expenditure it's still roughly the size of the FTX fraud (though I more like 10x larger if you count it over the whole last decade) .

To be clear, there is all kinds of stuff going on in the world that is bad, but in terms of things that are as clearly some kind of criminal or high-stakes government conspiracy, I think FTX stands among the biggest ones (though I totally agree there are probably other ones, though by nature its hard for me to say how many).

(In any case, I changed the word "conspiracy" here to "fraud" which I think gets the same point across, and my guess is we all agree that FTX is among the biggest frauds of the last decade)

There are over 100 companies globally with a market cap of more than 100 billion. If we're indexing on the $10 billion figure, these companies could have a bigger financial impact by doing "conspiracy-type" things that swung their value by <10%. How many of them have actually done that? No idea, but "dozens" doesn't seem implausible (especially when we note that many of them are based in authoritarian countries).

Re NSA: measuring the impact of the NSA in terms of inputs is misleading. The problem is that they're doing very highly-leveraged things like inserting backdoors into software, etc. That's true of politics more generally. It's very easy for politicians to insert clauses into bills that have >$10 billion of impact. How often are the negotiations leading up to that "conspiratorial"? Again, very hard to know.

in terms of things that are as clearly some kind of criminal or high-stakes government conspiracy, I think FTX stands among the biggest ones

This genuinely seems bizarre to me. A quick quote I found from googling:

The United Nations estimated in a 2011 report that worldwide proceeds from drug trafficking and other transnational organized crime were equivalent to 1.5 percent of global GDP, or $870 billion in 2009.

That's something like 100 FTXs per year; we mostly just don't see them. Basically I think that you're conflating legibility with impact. I agree FTX is one of the most legible ways in which people were defrauded this century; I also think it's a tiny blip on the scale of the world as a whole. (Of course, that doesn't make it okay by any means; it was clearly a big fuck-up, there's a lot we can and should learn from it, and a lot of people who were hurt.)

Does sure seem like there are definitional issues here. I do agree that drug trade and similar things bring the economic effects of conspiracy-type things up a lot, and I hadn't considered those, and agree that if you count things in that reference class FTX is a tiny blip. 

I think given that, I basically agree with you that FTX isn't that close to one of the biggest conspiracies of the last decade. I do think it's at the top of frauds in the last decade, though that's a narrower category.

I do think it's at the top of frauds in the last decade, though that's a narrower category.

Nikola went from a peak market cap of $66B to ~$1B today, vs. FTX which went from ~$32B to [some unknown but non-negative number].

I also think the Forex scandal counts as bigger (as one reference point, banks paid >$10B in fines), although I'm not exactly sure how one should define the "size" of fraud.[1] 

I wouldn't be surprised if there's some precise category in which FTX is the top, but my guess is that you have to define that category fairly precisely.

  1. ^

    Wikipedia says "the monetary losses caused by manipulation of the forex market were estimated to represent $11.5 billion per year for Britain’s 20.7 million pension holders alone" which, if anywhere close to true, would make this way bigger than FTX, but I think the methodology behind that number is just guessing that market manipulation made foreign-exchange x% less efficient, and then multiplying through by x%, which isn't a terrible methodology but also isn't super rigorous.

I wasn't intending to say "the literal biggest", though I think it's a decent candidate for the literal biggest. Depending on your definitions I agree things like Nikola or Forex could come out on top. I think it's hard to define things in a way so that it isn't in the top 5.

I think the heuristic "people take AI risk seriously in proportion to how seriously they take AGI" is a very good one.

Agree. Most people will naturally buy AGI Safety if they really believe in AGI. No AGI->AGI is the hard part, not AGI->AGI Safety.

I agree with all of these (except never felt worried about being quoted by Conjecture)

  • I think that the AI safety community in general (including myself) was too pessimistic about OpenAI's strategy of gradually releasing models (COI: I work at OpenAI), and should update more on that mistake.

I agree with this!

I thought it was obviously dumb, and in retrospect, I don't know.

[-]evhub8924

In the interest of saying more things publicly on this, some relevant thoughts:

  • I don't know what it means for somebody to be a "doomer," but I used to be a MIRI researcher and I am pretty publicly one of the people with the highest probabilities of unconditional AI existential risk (I usually say ~80%).
  • When Conjecture was looking for funding initially to get started, I was one of the people who was most vocal in supporting them. In particular, before FTX collapsed, I led FTX regranting towards Conjecture, and recommended an allocation in the single-digit millions.
  • I no longer feel excited about Conjecture. I view a lot of the stuff they're doing as net-negative and I wouldn't recommend anyone work there anymore. Almost all of the people at Conjecture initially that I really liked have left (e.g. they fired janus, their interpretability people left for Apollo, etc.), the concrete CoEms stuff they're doing now doesn't seem exciting to me, and I think their comms efforts have actively hurt our ability to enact good AI policy. In particular, I think their usage of Dario's statements on x-risk as a rhetorical weapon against RSPs creates a structural disincentive against lab heads being clear about existential risk and reduces the probability of us getting good RSPs from other labs and good RSP-based regulation.
  • Unlike Conjecture's comms efforts, I've been really happy with MIRI's comms efforts. I thought Eliezer's Time article was great, I really liked Nate's analysis of the different labs' policies, etc.
[-]aysja18-4

In particular, I think their usage of Dario's statements on x-risk as a rhetorical weapon against RSPs creates a structural disincentive against lab heads being clear about existential risk

I’m not sure how to articulate this, exactly, but I want to say something like “it’s not on us to make sure the incentives line up so that lab heads state their true beliefs about the amount of risk they’re putting the entire world in.” Stating their beliefs is just something they should be doing, on a matter this important, no matter the consequences. That’s on them. The counterfactual world—where they keep quiet or are unclear in order to hide their true (and alarming) beliefs about the harm they might impose on everyone—is deceptive. And it is indeed pretty unfortunate that the people who are most clear about this (such as Dario), will get the most pushback. But if people are upset about what they’re saying, then they should still be getting the pushback.

When I was an SRE at Google, we had a motto that I really like, which is: "hope is not a strategy." It would be nice if all the lab heads would be perfectly honest here, but just hoping for that to happen is not an actual strategy.

Furthermore, I would say that I see the main goal of outside-game advocacy work as setting up external incentives in such a way that pushes labs to good things rather than bad things. Either through explicit regulation or implicit pressure, I think controlling the incentives is absolutely critical and the main lever that you have externally for controlling the actions of large companies.

I don't think aysja was endorsing "hope" as a strategy– at least, that's not how I read it. I read it as "we should hold leaders accountable and make it clear that we think it's important for people to state their true beliefs about important matters."

To be clear, I think it's reasonable for people to discuss the pros and cons of various advocacy tactics, and I think asking "to what extent do I expect X advocacy tactic will affect peoples' incentives to openly state their beliefs?" makes sense.

Separately, though, I think the "accountability frame" is important. Accountability can involve putting pressure on them to express their true beliefs, pushing back when we suspect people are trying to find excuses to hide their beliefs, and making it clear that we think openness and honesty are important virtues even when they might provoke criticism– perhaps especially when they might provoke criticism. I think this is especially important in the case of lab leaders and others who have clear financial interests or power interests in the current AGI development ecosystem.

It's not about hoping that people are honest– it's about upholding standards of honesty, and recognizing that we have some ability to hold people accountable if we suspect that they're not being honest. 

I would say that I see the main goal of outside-game advocacy work as setting up external incentives in such a way that pushes labs to good things rather than bad things

I'm currently most excited about outside-game advocacy that tries to get governments to implement regulations that make good things happen. I think this technically falls under the umbrella of "controlling the incentives through explicit regulation", but I think it's sufficiently different from outside-game advocacy work that is trying to get labs to do things voluntarily. 

I think their usage of Dario's statements on x-risk as a rhetorical weapon against RSPs creates a structural disincentive against lab heads being clear about existential risk and reduces the probability of us getting good RSPs from other labs and good RSP-based regulation.

Setting aside my personal models of Connor/Gabe/etc, the only way this action reads to me as making sense if one feels compelled to go all in on "so-called Responsible Scaling Policies are primarily a fig leaf of responsibility from ML labs, as the only viable responsible option is to regulate them / shut them down". I assign at least 10% to that perspective being accurate, so I am not personally ruling it out as a fine tactic.

I agree it is otherwise disincentivizing[1] in worlds where open discussion and publication of scaling policies (even I cannot bring myself to calling them 'responsible') is quite reasonable.

  1. ^

    Probably Evan/others agree with this, but I want to explicitly point out that the CEOs of the labs such as Amodei and Altman and Hassabis should answer the question honestly regardless of how it's used by those they're in conflict with, the matter is too important for it to be forgivable that they would otherwise be strategically avoidant in order to prop up their businesses.

[-]307th1815

There are all kinds of benefits to acting with good faith, and people should not feel licensed to abandon good faith dialogue just because they're SUPER confident and this issue is REALLY IMPORTANT. 

When something is really serious it becomes even more important to do boring +EV things like "remember that you can be wrong sometimes" and "don't take people's quotes out of context, misrepresent their position, and run smear campaigns on them; and definitely don't make that your primary contribution to the conversation".

Like, for Connor & people who support him (not saying this is you Ben): don't you think it's a little bit suspicious that you ended up in a place where you concluded that the very best use of your time in helping with AI risk was tweet-dunking and infighting among the AI safety community? 

Like, I am quite worried that we will end up with some McCarthy-esque immune reaction to EA people in the US and the UK government where people will be like "wait, what the fuck, how did it happen that this weirdly intense social group with strong shared ideology is now suddenly having such an enormous amount of power in government? Wow, I need to kill this thing with fire, because I don't even know how to track where it is, or who is involved, so paranoia is really the only option". 

This is looking increasingly prescient. 

[Edit to add context]

Not saying this is happening now, but after the board decisions at OpenAI, I could imagine more people taking notice. Hopefully the sentiment then will just be open discourse and acknowledging that there's now this interesting ideology besides partisan politics and other kinds of lobbying/influence-seeking that are already commonplace. But to get there, I think it's plausible that EA has some communications- and maybe trust-building work to do. 

Just for the record, if the current board thing turns out to be something like a play of power from EAs in AI Safety trying to end up more in control (by e.g. planning to facilitate a merger or a much closer collaboration with Anthropic), and the accusations of lying to the board turn out to be a nothing-burger, then I would consider this a very central example of the kind of political play I was worried would happen (and indeed involved Helen who is one of the top EA DC people). 

Correspondingly I assign decently high (20-25%) probability to that indeed being what happened, in which case I would really like the people involved to be held accountable (and for us to please stop the current set of strategies that people are running that give rise to this kind of thing).

As you'd probably agree with, it's plausible that Sutskever was able to convince the board about specific concerns based on his understanding of the technology (risk levels and timelines) or his day-to-day experience at OpenAI and direct interactions with Sam Altman. If that's what happened, then it wouldn't be fair [to say that] any EA-minded board members just acted in an ideology-driven way. (Worth pointing out for people who don't know this that Sutskever has no ties to EA; it just seems like he shares concerns about the dangers from AI.)

But let's assume that it comes out that EA board members played a really significant role or were even thinking about something like this before Sutskever brought up concerns. "Play of power" evokes connotations of opportunism and there being no legitimacy for the decision other than that the board thought they could get away with it. This sort of concern you're describing would worry me a whole lot more if OpenAI had a typical board and corporate structure.

However, since they have a legal structure and mission that emphasizes benefitting humanity as a whole and not shareholders, I'd say situations like the one here are (in theory) exactly why the board was set up that way. The board's primary task is overseeing the CEO. To achieve OpenAI's mission, the CEO needs to have the type of personality and thinking habits so he will likely converge toward whatever the best-informed views are about AI risks (and benefits) and how to mitigate (and actualize) them. The CEO shouldn't be someone who is unlikely to engage in the sort of cognition that one would perform if one cared greatly about long-run outcomes rather than near-term status and took seriously the chance of being wrong about one's AI risk and timeline assumptions. Regardless of what's actually true about Altman, it seems like the board came to a negative conclusion about his suitability. In terms of how they made this update, we can envision some different scenarios, some of them would seem unfair to Altman and "ideology-driven" in a sinister way, while others would seem legitimate. (The following scenarios will take for granted that the thing that happened had elements of a "AI safety coup," as opposed to a "Sutskever coup" or "something else entirely." Again, I'm not saying that any of this is confirmed; I'm just going with the hypothesis where the EA involvement has the most potential for controversy.) So, here are three variants of how the board could have updated that Altman is not suitable for the mission:

(1) The responsible board members (could just be a subset of the ones that voted against Altman rather than all four of them) never gave him much of a chance. They learned that Altman is less concerned about AI notkilleveryoneism than they would've liked, so they took an opportunity to try to oust him. (This is bad because it's ideology-driven rather than truth-seeking.) 

(2) The responsible board members did give Altman a chance initially, but he deceived them in a smoking-gun-type breach of trust.

(3) The responsible board members did gave Altman a chance initially, but they became increasingly disillusioned through a more insincere-vibes-based and gradual erosion of trust, perhaps accompanied by disappointments from empty promises/assurances about, e.g., taking safety testing more seriously for future models, avoiding racing dynamics/avoiding giving out too much info on how to speed up AI through commercialization/rollouts, etc. (I'm only speculating here with the examples I'm giving, but the point is that if the board is unusually active about looking into stuff, it's conceivable that they maybe-justifiably reached this sort of update even without any smoking-gun-type breach of trust.) 

Needless to say, (1) would be very bad board behavior and would put EA in a bad light. (2) would be standard stuff about what boards are there for, but seems somewhat unlikely to have happened here based on the board not being able to easily give more info to the public about what Altman did wrong (as well as the impression I get that they don't hold much leverage in the negotations now). (3) seems most likely to me and also quite complex to make judgments about the specifics, because lots of things can fall into (3). (3) requires an unusually "active/observant" board. This isn't necessarily bad. I basically want to flag that I see lots of (3)-type scenarios where the board acted with integrity and courage, but also (admittedly) probably displayed some inexperience by not preparing for the power struggle that results after a decision like this, and by (possibly?) massively mishandling communications, using wording that may perfectly describe what happened when the description is taken literally, but is very misleading when we apply the norms about how parting ways announcements are normally written in very tactful corporate speak. (See also Eliezer's comment here.) Alternatively, it's also possible that a (3)-type scenario happened, but the specific incremental updates were uncharitable towards Altman due to being tempted by "staging a coup," or stuff like that. It gets messy when you have to evaluate someone's leadership fit where they have a bunch of uncontested talents but also some orange flags and you have decide what sort of strengths or weaknesses are most essential for the mission.

For me the key variable is whether they took a decision that would have put someone substantially socially closer to them in charge, with some veneer of safety motivation, but where the ultimate variance in their decision would counterfactually be driven by social proximity and pre-existing alliances. 

A concrete instance of this would be if the plan with the firing was to facilitate some merge with Anthropic, or to promote someone like Dario to the new CEO position, who the board members (which were chosen by Holden) have a much tighter relationship to.

Clarification/history question: How were these board members chosen?

My current model is that Holden chose them. Tasha in 2018, Helen in 2021 when he left and chose Helen as his successor board member.

I don't know, but I think it was close to a unilateral decision from his side (like I don't think anyone at Open AI had much reason to trust Helen outside of Holden's endorsement, so my guess is he had a lot of leeway).

Thanks! And why did Holden have the ability to choose board members (and be on the board in the first place)?

I remember hearing that this was in exchange for OP investment into OpenAI, but I also remember Dustin claiming that OpenAI didn’t actually need any OP money (would’ve just gotten the money easily from another investor).

Is your model essentially that the OpenAI folks just got along with Holden and thought he/OP were reasonable, or is there a different reason Holden ended up having so much influence over the board?

My model is that this was a mixture of a reputational trade (OpenAI got to hire a bunch of talent from EA spaces and make themselves look responsible to the world) and just actually financial ($30M was a substantial amount of money at that point in time).

Sam Altman has many times said he quite respects Holden, so that made up a large fraction of the variance. See e.g. this tweet

(i used to be annoyed at being the villain of the EAs until i met their heroes*, and now i'm lowkey proud of it 

*there are a few EA heroes i think are really great, eg Holden)

[...] reputational trade (OpenAI got to hire a bunch of talent from EA spaces and make themselves look responsible to the world) [...]

Yes, I think "reputational trade," i.e., something that's beneficial for both parties, is an important part of the story that the media hasn't really picked up on. EAs were focused on the dangers and benefits from AI way before anyone else, so it carries quite some weight when EA opinion leaders put an implicit seal of approval on the new AI company. 

There's a tension between 
(1) previously having held back on natural-seeming criticism of OpenAI ("putting the world at risk for profits" or "they plan on wielding this immense power of building god/single-handedly starting something bigger than the next Industrial Revolution/making all jobs obsolete and solving all major problems") because they have the seal of approval from this public good, non-profit, beneficial-mission-focused board structure, 

and 

(2) being outraged when this board structure does something that it was arguably intended to do (at least under some circumstances).

(Of course, the specifics of how and why things happened matter a lot, and maybe most people aren't outraged because the board did something, but rather because of how they did it or based on skepticism about reasons and justifications. On those later points, I sympathize more with people who are outraged or concerned that something didn't go right. But we don't know all the details yet.)

(Of course, the specifics of how and why things happened matter a lot, and maybe most people aren't outraged because the board did something, but rather because of how they did it or based on skepticism about reasons and justifications. On those later points, I sympathize more with people who are outraged or concerned that something didn't go right. But we don't know all the details yet.)

Almost all the outrage I am seeing is about how this firing was conducted. I think if the board had a proper report ready that outlined why they think OpenAI was acting recklessly, and if they had properly consulted with relevant stakeholders before doing this, I think the public reaction would be very different.

I agree there are also some random people on the internet who are angry about the board taking any action even though the company is going well in financial terms, but most of the well-informed and reasonably people I've seen are concerned about the way this was rushed and how the initial post seemed to pretty clearly imply that Sam had done some pretty serious deception, without anything to back that up with.

Okay, that's fair.

FWIW, I think it's likely that they thought about this decision for quite some time and systematically – I mean the initial announcement did mention something about a "deliberative review process by the board." But yeah, we don't get to see any of what they thought about or who (if anyone) they consulted for gathering further evidence or for verifying claims by Sutskever. Unfortunately, we don't know yet. And I concede that given the little info we have, it takes charitable priors to end up with "my view." (I put it in quotation marks because it's not like I have more than 50% confidence in it. Mostly, I want to flag that this view is still very much on the table.) 

Also, on the part about "imply that Sam had done some pretty serious deception, without anything to back that up with." I'm >75% that either Eliezer nailed it in this tweet, or they actually have evidence about something pretty serious but decided not to disclose it for reasons that have to do with the nature of the thing that happened. (I guess the third option is they self-deceived into thinking their reasons to fire Altman will seem serious/compelling [or at least defensible] to everyone to whom they give more info, when in fact the reasoning is more subtle/subjective/depends on additional assumptions that many others wouldn't share. This could then have become apparent to them when they had to explain their reasoning to OpenAI staff later on, and they aborted the attempt in the middle of it when they noticed it wasn't hitting well, leaving the other party confused. I don't think that would necessarily imply anything bad about the board members' character, though it is worth noting that if someone self-deceives in that way too strongly or too often, it makes for a common malefactor pattern, and obviously it wouldn't reflect well on their judgment in this specific instance. One reason I consider this hypothesis less likely than the others is because it's rare for several people – the four board members – to all make the same mistake about whether their reasoning will seem compelling to others, and for none of them to realize that it's better to err on the side of caution and instead say something like "we noticed we have strong differences in vision with Sam Altman," or something like that.)

My current model is that this is unlikely to have been planned long in-advance. For example, for unrelated reasons I was planning to have a call with Helen last week, and she proposed a meeting time of last Thursday (when I responded with my availability for Thursday early in the week, she did not respond). She did then not actually schedule the final meeting time and didn't respond to my last email, but this makes me think that at least early in the week, she did not expect to be busy on Thursday. 

There are also some other people who I feel like I would expect to know about this if it had been planned who have been expressing their confusion and bafflement at what is going on on Twitter and various Slacks I am in. I think if this was planned, it was planned as a background thing, and then came to a head suddenly, with maybe 1-2 days notice, but it doesn't seem like more.

I also notice that I am just afraid of what would happen if I were to e.g. write a post that's just like "an overview over the EA-ish/X-risk-ish policy landscape" that names specific people and explains various historical plans. Like I expect it would make me a lot of enemies.

This seems like a bad idea.

Transparency is important, but ideally, we would find ways to increase this without blowing up a bunch of trust within the community. I guess I'd question whether this is really the bottleneck in terms of transparency/public trust.

I'm worried that as a response to FTX we might end up turning this into a much more adversarial space.

Adversarialness, honesty, attribution; re: how to talk in DC

I love what y'all said about this, found it a pleasure to read, and want to share some of my own thoughts, some echoing things you already said.

Let's not treat society as an adversary, rather let's be collaborators/allies and even leaders, helping and improving society and its truth-seeking processes. That doesn't mean we shouldn't have any private thoughts or plans. It does mean society gets to know who we are, who's behind what, and what we're generally up to and aiming for. Hiding attribution and intentions is IMO a way of playing into the adversarial/polarized/worst parts of our society's way of being and doing, and I agree with Oliver that doing so will likely come back to undermine us and what we care about. If we act like a victim/adversary wrt society, it won't work, including because society will see us that way. Let's instead meet society with the respect we want to see in the world, and ask it to step up and do the same for us. Let's pursue plans and intentions that we're happy standing in and being seen in.

I have only very limited experienced in DC type conversations, but my sense is there are ways of sharing your real thing, while being cooperative, which likely don't lead to dismissal and robustly don't lead to poisoning the well. Here's perhaps the start of one, which could be made more robust with some workshopping: 1) share your polaris (existential stakes) in a way that they could feasibly understand given where they're at; 2) share your proposals and how you see those as aligned with other near-term AI considerations like the ones they might have; 3) actually listen to and respect the opinions of the people you're talking with, and be willing to go into their frame, remembering that it's not your job to convince or persuade them. (Thinking it's your job to convince or persuade them is probably the main/upstream mistake folks make?) Those things seem to me to likely belong in nearly every conversation. Should you include your "mood" that things in fact are very dire? There's not a strategic/correct answer to this, because strategy/correctness is not what mood is for/about. Share your truth in a way that you feel serves mutual understanding. Share your feelings in a way that you feel serves mutual relating.

And then one of my current stories is that at some point, mostly after FTX when people were fed up with listening to some vague EA conservative consensus, a bunch of people started ignoring that advice and finally started saying things publicly (like the FLI letter, Eliezer's time piece, the CAIS letter, Ian Hogarth's piece). And then that's the thing that's actually been moving things in the policy space. 

My impression is that this was driven by developments in AI, which created enough of a sense that other people might predict that other people would take concern seriously, because they could all just see ChatGPT. And this emboldened people. They had more of a sense of tractability.

And Eliezer, in particular, went on a podcast, and it went better than he anticipated, so he decided to do more outreach.

My impression is that this is basically 0 to do with FTX?

Eliezer and Max and Dan could give their own takes here. My guess is there was a breaking of EA consensus on taking the PR-conservative route, and then these things started happening. I think some of that breaking was due to FTX.

Eliezer started talking about high P(doom) around the Palmcone which I think was more like peak FTX hype. And it seemed like his subsequent comms were part of a trend that began with the MIRI dialogues the year before. I’d bet against FTX collapse being that causal at least for him.

I don't think Death with Dignity or the List O' Doom posts are at all FTX-collapse related. I am talking about things like the Time piece.

Yes, but I’m drawing a line from ‘MIRI dialogues’ through Death With Dignity and modeling Eliezer generally and I think the line just points roughly at the Time piece without FTX.

They often do things of the form "leaving out info, knowing this has misleading effects"

On that, here are a few examples of Conjecture leaving out info in what I think is a misleading way.

(Context: Control AI is an advocacy group, launched and run by Conjecture folks, that is opposing RSPs. I do not want to discuss the substance of Control AI’s arguments -- nor whether RSPs are in fact good or bad, on which question I don’t have a settled view -- but rather what I see as somewhat deceptive rhetoric.)

One, Control AI’s X account features a banner image with a picture of Dario Amodei (“CEO of Anthropic, $2.8 billion raised”) saying, “There’s a one in four chance AI causes human extinction.” That is misleading. What Dario Amodei has said is, “My chance that something goes really quite catastrophically wrong on the scale of human civilisation might be somewhere between 10-25%.” I understand that it is hard to communicate uncertainty in advocacy, but I think it would at least have been more virtuous to use the middle of that range (“one in six chance”), and to refer to “global catastrophe” or something rather than “human extinction”.

Two, Control AI writes that RSPs like Anthropic’s “contain wording allowing companies to opt-out of any safety agreements if they deem that another AI company may beat them in their race to create godlike AI”. I think that, too, is misleading. The closest thing Anthropic’s RSP says is:

However, in a situation of extreme emergency, such as when a clearly bad actor (such as a rogue state) is scaling in so reckless a manner that it is likely to lead to imminent global catastrophe if not stopped (and where AI itself is helpful in such defense), we could envisage a substantial loosening of these restrictions as an emergency response. Such action would only be taken in consultation with governmental authorities, and the compelling case for it would be presented publicly to the extent possible.

Anthropic’s RSP is clearly only meant to permit labs to opt out when any other outcome very likely leads to doom, and for this to be coordinated with the government, with at least some degree of transparency. The scenario is not “DeepMind is beating us to AGI, so we can unilaterally set aside our RSP”, but more like “North Korea is beating us to AGI, so we must cooperatively set aside our RSP”.

Relatedly, Control AI writes that, with RSPs, companies “can decide freely at what point they might be falling behind – and then they alone can choose to ignore the already weak” RSPs. But part of the idea with RSPs is that they are a stepping stone to national or international policy enforced by governments. For example, ARC and Anthropic both explicitly said that they hope RSPs will be turned into standards/regulation prior to the Control AI campaign. (That seems quite plausible to me as a theory of change.) Also, Anthropic commits to only updating its RSP in consultation with its Long-Term Benefit Trust (consisting of five people without any financial interest in Anthropic) -- which may or may not work well, but seems sufficiently different from Anthropic being able to “decide freely” when to ignore its RSP that I think Control AI’s characterisation is misleading. Again, I don't want to discuss the merits of RSPs, I just think Control AI is misrepresenting Anthropic's and others' positions.

Three, Control AI seems to say that Anthropic’s advocacy for RSPs is an instance of safetywashing and regulatory capture. (Connor Leahy: “The primary aim of responsible scaling is to provide a framework which looks like something was done so that politicians can go home and say: ‘We have done something.’ But the actual policy is nothing.” And also: “The AI companies in particular and other organisations around them are trying to capture the summit, lock in a status quo of an unregulated race to disaster.”) I don’t know exactly what Anthropic’s goals are -- I would guess that its leadership is driven by a complex mixture of motivations -- but I doubt it is so clear-cut as Leahy makes it out to be.

To be clear, I think Conjecture has good intentions, and wants the whole AI thing to go well. I am rooting for its safety work and looking forward to seeing updates on CoEm. And again, I personally do not have a settled view on whether RSPs like Anthropic’s are in fact good or bad, or on whether it is good or bad to advocate for them – it could well be that RSPs turn out to be toothless, and would displace better policy – I only take issue with the rhetoric.

(Disclosure: Open Philanthropy funds the organisation I work for, though the above represents only my views, not my employer’s.)

I'm surprised to hear they're posting updates about CoEm.

At a conference held by Connor Leahy, I said that I thought it was very unlikely to work, and asked why they were interested in this research area, and he answered that they were not seriously invested in it.

We didn't develop the topic and it was several months ago, so it's possible that 1- I misremember or 2- they changed their minds 3- I appeared adversarial and he didn't feel like debating CoEm. (For example, maybe he actually said that CoEm didn't look promising and this changed recently?)
Still, anecdotal evidence is better than nothing, and I look forward to seeing OliviaJ compile a document to shed some light on it.

So, I'm often tempted to mention my x risk motivations only briefly, then focus on whatever's inferentially closest and still true. (Classically, this would be "misuse risks, especially from foreign adversaries and terrorists" and "bioweapon and cyberoffensive capabilities coming in the next few years".) 

One heuristic that I'm tempted to adopt and recommend is the onion test: your communications don't have to emphasize your weird beliefs, but you want your communications to satisfy the criterion that if your interlocutor became aware of everything you think, they would not be surprised.

This means that I'll when I'm talking with a potential ally, I'll often mostly focus on places where we agree, while also being intentional about flagging that I have disagreements that they could double click on if they wanted.

I'm curious if your sense, Olivia, is that your communications (including the brief communication of x risk) passes the onion test. 

And if not, I'm curious what's hard about meeting that standard. Is this a heuristic that can be made viable in the contexts of eg DC?

Adding a datapoint here: I've been involved in the Control AI campaign, which was run by Andrea Miotti (who also works at Conjecture). Before joining, I had heard some integrity/honesty concerns about Conjecture. So when I decided to join, I decided to be on the lookout for any instances of lying/deception/misleadingness/poor integrity. (Sidenote: At the time, I was also wondering whether Control AI was just essentially a vessel to do Conjecture's bidding. I have updated against this– Control AI reflects Andrea's vision. My impression is that Conjecture folks other than Andrea have basically no influence over what Control AI does, unless they convince Andrea to do something.)

I've been impressed by Andrea's integrity and honesty. I was worried that the campaign might have some sort of "how do we win, even if it misleads people" vibe (in which case I would've protested or left), but there was constantly a strong sense of "are we saying things that are true? Are we saying things that we actually believe? Are we communicating clearly?" I was especially impressed given the high volume of content (it is especially hard to avoid saying untrue/misleading things when you are putting out a lot of content at a fast pace.)

In contrast, integrity/honesty/openness norms feel much less strong in DC. When I was in DC, I think it was fairly common to see people "withhold information for strategic purposes", "present a misleading frame (intentionally)", "focus on saying things you think the other person will want to hear", or "decide not to talk at all because sharing beliefs in general could be bad." It's plausible to me that these are "the highest EV move" in some cases, but if we're focusing on honesty/integrity/openness, I think DC scored much worse. (See also Olivia's missing mood point). 

The Bay Area scores well on honesty/integrity IMO, but has its own problems, especially with groupthink/conformity/curiosity-killing. I think the Bay Area tends to do well on honesty/integrity norms (relative to other spaces), but I think these norms are enforced in a way that comes with important tradeoffs. For instance, I think the Bay Area tends to punish people for saying things that are imprecise or "seem dumb", which leads to a lot of groupthink/conformity and a lot of "people just withholding their beliefs so that they don't accidentally say something incorrect and get judged for it." Also, I think high-status people in the Bay Area are often able to "get away with" low openness/transparency/clarity. There are lots of cases where people are like "I believe X because Paul believes X" and then when asked "why does Paul believe X" they're like "idk". This seems like an axis separate from honesty/integrity, but it still leads to pretty icky epistemic discourse. 

(This isn't to say that people shouldn't criticize Conjecture– but I think there's a sad thing that happens where it starts to feel like both "sides" are just trying to criticize each other. My current position is much closer to something like "each of these communities has some relative strengths and weaknesses, and each of them has at least 1-3 critical flaws". Whereas in the status quo I think these discussions sometimes end up feeling like members of tribe A calling tribe B low integrity and then tribe B firing back by saying Tribe A is actually low integrity in an even worse way.) 

Interesting discussion! 

Probably somewhat controversially, but I've been kind of happy about the Politico pieces that have been published. We had two that basically tried to make the case there is an EA conspiracy in DC that has lots of power in a kind of unaccountable way.

Maybe someone could reach out to the author and be like "Ok, yeah, we are kind of a bit conspiratorial, sorry about that. But I think let's try to come clean, I will tell you all the stuff that I know, and you take seriously the hypothesis that we really aren't doing this to profit off of AI, but because we are genuinely concerned about catastrophic risks from AI".

I like that you imagine conversations like that in your head and that they sometimes go well there! 

Seems important to select the right journalist if someone were to try this. I feel like the journalist would have to be sympathetic already or at least be a very reasonable and fair-minded person. Unfortunately, some journalists cannot think straight for the life of them and only make jumpy, shallow associations like "seeks influence, so surely this person must be selfish and greedy."

I didn't read the Politico article yet, but given that "altruism" is literally in the name with "EA," I wonder why it needs to be said "and you take seriously the hypothesis that we really aren't doing this to profit off of AI." If a journalist is worth his or her salt, and they write about a movement called EA, shouldn't a bunch of their attention go into the question of why/whether some of these people might be genuine? And if the article takes a different spin and they never even consider that question, doesn't it suggest something is off? (Again, haven't read the article yet – maybe they do consider it or at least leave it open.)

Should I talk about inferentially close things that makes them likeliest to embrace the policies I'm putting on their desk,

Here are some of my current principles around this issue.

1) It's fine to contribute to work on policies that you think are good-to-neutral that aren't your core motivation, just so you can get involved with the thing that is your core motivation. 

"I am happy to be here helping the Senator achieve an agreement with foreign country Fiction-land on export taxes. I am not here because I personally care a great deal about export taxes (they seem fine to me), but because I want to build a personally strong relationship with representatives of Fiction-land, and I'm happy to help this appears-good-to-me agreement get made to do so.

2) It's not okay to contribute to work on policies that you think are harmful for the world, just so you can get involved with the thing that is your core motivation. 

"I am here helping coordinate a ban on building houses in San Francisco which I expect will contribute to homelessness and substantially damage the economic growth of the city, because I would like to build a relationship with the city governance."

3) It's never okay to misrepresent what you believe. 

"I am here primarily because I personally care about this policy a great deal" vs "I am here because the policy seems reasonable to me and my good ally John Smith has asked me to help him get it enacted, who I believe honestly cares about it and thinks it will make people's lives better."

4) I think it's okay to spend time talking about things that aren't your favorite thing. 

"Let's talk through how we could get this policy passed that is like my 20th favorite thing" or "Let's spend a few hours figuring out how to achieve this goal that the head of our office wants even though I am not personally invested in it."

5) But you should answer honestly about your intentions.

"I'm here because I'd like to prevent us all going extinct from AI, but while I'm building up a career and reputation, in the meantime I'm happy to improve the government's understanding of what's even happening, or improve its communications with companies, or join in on what we're all working on here."

and things like "being gay" where society seems kind of transparently unreasonable about it,

Importantly "being gay" is classed for me as "a person's personal business", sort of irrespective of whether society is reasonable about it or not. I'm inclined to give people leeway to keep private personal info that doesn't have much impact on other people.

Yeah, seems like a reasonable thing to mention. 

Can you say a bit more about Eleuther's involvement in the last of these papers? I thought that this was mostly done by people working at Apollo. EleutherAI is credited for providing compute (at the same level as OpenAI for providing GPT-4 credits), but I am surprised that you would claim it as work produced by EleutherAI?

Ooops. It appeared that I deleted my comment (deeming it largely off-topic) right as you were replying. I'll reproduce the comment below, and then reply to your question.

I separately had a very weird experience with them on the Long Term Future Fund where Conor Leahy applied for funding for Eleuther AI. We told him we didn't want to fund Eleuther AI since it sure mostly seemed like capabilities-research but we would be pretty interested in funding AI Alignment research by some of the same people. He then confusingly went around to a lot of people around EleutherAI and told them that "Open Phil is not interested in funding pre-paradigmatic AI Alignment research and that that is the reason why they didn't fund Eleuther AI". This was doubly confusing and misleading because Open Phil had never evaluated a grant to Eleuther AI (Asya who works at Open Phil was involved in the grant evaluation as a fund member, but nothing else), and of course the reason he cited had nothing to do with the reason we actually gave. He seems to have kept saying this for a long time even after I think someone explicitly corrected the statement to

While this anecdote is largely orthogonal to the broader piece, I remembered that this existed randomly today and wanted to mention that Open Phil has recommended a 2.6 M/3 years grant to EleutherAI to pursue interpretability research. It was a really pleasant and very easy experience: Nora Belrose (head of interpretability) and I (head of everything) talked with them about some of our recent and on-going work such as Eliciting Latent Predictions from Transformers with the Tuned Lens, Eliciting Latent Knowledge from Quirky Language Models, and Sparse Autoencoders Find Highly Interpretable Features in Language Models very interesting and once they knew we had shared areas of interest it was a really easy experience.

I had no vibes along the lines of "oh we don't like EleutherAI" or "we don't fund pre-paradigmatic research." It was a surprise to some people at Open Phil that we had areas of overlapping interest, but we spent like half an hour clarifying our research agenda and half an hour talking about what we wanted to do next and people were already excited.

We ended up talking about this in DMs, but to gist of it is:

Back in June Hoagy opened a thread in our "community research projects" channel and the work migrated there. Three of the five authors of the [eventual paper](https://arxiv.org/abs/2309.08600) chose to have EleutherAI affiliation (for any work we organize with volunteers, we tell them they're welcome to use an EleutherAI affiliation on the paper if they like) and we now have an entire channel dedicated to future work. I believe Hoagy has two separate paper ideas currently in the works and over a half dozen people working on them.

If you go with journalists, I'd want to find one who seems really truth-seeking.

I think it would be a very valuable public service to the community to have someone who’s job it is to read a journalist’s corpus and check if it seems fair and honest.

I think we could, as a community, have a policy of only talking with journalists who are honest. This seems like a good move pragmatically, because it means coverage of our stuff will be better on average, and it also universalizes really well, so long as “honest” doesn’t morph into “agrees with us about what’s important.”

It seems good and cooperative to disproportionately help high-integrity journalists get sources, and it helps us directly.

Like, a big problem with doing this kind of information management where you try to hide your connections and affiliations is that it's really hard for people to come to trust you again afterwards. If you get caught doing this, it's extremely hard to rebuild trust that you aren't doing this in the future, and I think this dynamic usually results in some pretty intense immune reactions when people fully catch up with what is happening.

I would have guessed that this is just not the level of trust people operate at. like for most things in policy people don't really act like their opposition is in good faith so there's not much to lose here. (weakly held)

A claim I've heard habryka make before (I don't know myself) is that there are actual rules to the kind of vague-deception that goes on in DC. And something like, while it's a known thing that a politician will say "we're doing policy X" when they don't end up doing policy X, if you misrepresent who you're affiliated with, this is an actual norm violation. (i.e. it's lying about the Simulacrum 3 level, which is the primary level in DC)

My guess is we maybe could have also done that at least a year earlier, and honestly I think given the traction we had in 2015 on a lot of this stuff, with Bill Gates and Elon Musk and Demis, I think there is a decent chance we could have also done a lot of Overton window shifting back then, and us not having done so is I think downstream of a strategy that wanted to maintain lots of social capital with the AI capability companies and random people in governments who would be weirded out by people saying things outside of the Overton window.

Though again, this is just one story, and I also have other stories where it all depended on Chat-GPT and GPT-4 and before then you would have been laughed out of the room if you had brought up any of this stuff (though I do really think the 2015 Superintelligence stuff is decent evidence against that). It's also plausible to me that you need a balance of inside and outside game stuff, and that we've struck a decent balance, and that yeah, having inside and outside game means there will be conflict between the different people involved in the different games, but it's ultimately the right call in the end.

I really want an analysis of this. The alignment and rationality communities were wrong about how tractable getting public & governmental buy-in to AI x-risk would be. But what exactly was the failure? That seems quite important to knowing how to alter decision making and to prevent future failures to grab low-hanging fruit. 

I tried writing a fault analysis myself, but I couldn't make much progress and it seems like you more detailed models than I do. So someone other than me is probably the right person for this. 

That said, the dialogues on AI governance and outreach are providing some of what I'm looking for here, and seem useful to anyone who does want to write an analysis. So thank you to everyone who's discussing these topics in public.

AI has immense potential, but also immense risks. AI might be misused by China, or get of control. We should balance the needs for innovation and safety." I wouldn't call this lying (though I agree it can have misleading effects, see Issue 1).


Not sure where this slots in, but there's also a sense in which this contains a missing positive mood about how unbelievably good (aligned) AI could or will be, and how much we're losing by not having it earlier.

On the meta level, for unfalsifiable claims (and even falsifiable claims that would take more effort to verify then a normal independent 3rd party adult could spend in say a month)  it doesn't really seem to matter whether the person pushing the claims has integrity, beyond a very low bar?

They just need to have enough integrity to pass the threshold of not being some maniac/murderer/ persistent troll/etc...

But otherwise, beyond that threshold, there doesn't seem to be a much of a downside in treating folks politely while always assuming there's some hidden motivations going on behind the scenes.

And for the falsifiable claims that have reasonable prospects for independent 3rd party verification, assigning credibility, trust, integrity, etc., based on the author's track record of such claims proving to be true is more than sufficient for discussions. And without regard for what hidden motivations might there be.

Maybe this is not sufficient for the community organization/emotional support/etc. side of things, though you'd be the better judge of that.