Introduction: some contemporary AI governance context

It’s a confusing time in AI governance. Several countries’ governments recently changed hands. DeepSeek and other technical developments have called into question certain assumptions about the strategic landscape. Political discourse has swung dramatically away from catastrophic risk and toward framings of innovation and national competitiveness.

Meanwhile, the new governments have issued statements of policy, and AI companies (mostly) continue to publish or update their risk evaluation and mitigation approaches. Interpreting these words and actions has become an important art for AI governance practitioners: does the phrase “human flourishing” in the new executive order signal concern about superintelligence, or just that we should focus on AI’s economic and medical potential and not “hand-wring” about safety? How seriously should we take the many references to safety in the UK’s AI Opportunities Action Plan, given the unreserved AI optimism in the announcement? Does Meta’s emphasis on “unique” risks take into account whether a model’s weights are openly released? The answers matter not only for predicting future actions but also for influencing them: it’s useful to know an institution’s relative appetite for different kinds of suggestions, e.g. more export controls versus maintaining Commerce’s reporting requirements.

So, many people who work in AI governance spend a lot of time trying to read between the lines of these public statements, talking to their contacts at these institutions, and comparing their assessment of the evidence with others’. This means they can wind up with a lot of non-public information — and often, they also have lots of context that casual observers (or people who are doing heads-down technical work in the Bay) might not.

All of that is to say: if you hear someone express a view about how an institution is thinking about AI (or many other topics), you might be tempted to update your own view towards theirs, especially if they have expertise or non-public information. And, of course, this is sometimes the correct response.

But this post argues that you should take these claims with a grain of salt. The rest of the post shifts to a much higher level of abstraction than the above, in part because I don’t want to “put anyone on blast,” and in part because this is a general phenomenon. Note that lots of these are generic reasons to doubt claims you can’t independently verify, but some of them are specific to powerful institutions.

Biases towards claiming agreement with one’s own beliefs

Let’s say you hear Alice say that a powerful institution (like a political party, important company, government, etc.) agrees with her position on a controversial topic more than you might think.

If you have reason to think that Alice knows more about that institution than you do, or just has some information that you don’t have, you might be inclined to believe Alice and update your views accordingly: maybe that institution is actually more sympathetic to Alice’s views than you realized!

This might be true, of course. But I’d like to point out a few reasons to be skeptical of this claim.

  • Maybe Alice is basing her claim on interactions with people in the institution whose views aren’t publicly known. But this evidence is liable to be biased:
    • The people Alice knows within the institution probably agree with Alice more than the average person in that institution. After all, they are somehow connected to Alice. This means they’re more likely than the average person in that institution to share some characteristic with Alice, like both having lived in the Bay Area, or both having worked in the national security space. Or maybe it’s even just that Alice has convinced them individually.
    • Those people are also incentivized to convince Alice that they agree with her more than they do. Giving Alice the impression that they’re on her side probably makes Alice more likely to take actions that help them rather than obstruct them, or gives her the impression that they’ve done her a meaningful favor (“I passed along that idea you mentioned, and I think there’s buy-in for it – we’ll see!”).
  • Maybe Alice is making this claim strategically, e.g. because expressing support for the institution makes them more likely to listen to her, and/or she’s trying to “incept” the idea that they hold this view.
  • Maybe Alice would be better off if it were true, and even though Alice doesn’t knowingly lie, the selfish parts of her brain can convince the reasoning parts of her brain that convenient things are true.
    • For example, maybe Alice’s work is at least partly aimed at influencing the institution, and Alice would be better able to recruit and fundraise to the extent that people believe that influencing this institution is tractable.
    • Or perhaps Alice is on record predicting that the institution will agree with her, and it would make her look prescient if people believe it does (or embarrass her if not). 

Weaker biases towards claiming disagreement with one’s own beliefs

Now imagine that you hear Bob, who agrees with Alice’s view, make the opposite claim: actually, the institution disagrees with us!

Not all of the same factors above apply – and I think, on net, these effects are stronger for those claiming agreement than disagreement, roughly in proportion to how powerful the institution is. But some of them still do, at least for some permutation:

  • Symmetrically, maybe Bob publicly predicted that the institution wouldn’t agree with him and stands to gain or lose status depending on whether people believe it does.
  • Maybe Bob isn’t trying to influence that institution – call it Institution A – but rather is trying to influence some opposing institution called Institution B.
    • By saying Institution A disagrees with him, he could be demonstrating his opposition to Institution A and thus his affiliation with Institution B, and trying to negatively polarize Institution B towards his view.
    • I think this effect is probably especially weak, but if Bob can make it look intractable to influence Institution A, this makes his own efforts to influence Institution B more appealing to employers and funders.

Conclusion

I wouldn’t totally dismiss either claim, especially if Alice/Bob do have some private information, even if I knew that they had many of these biases. Claims like theirs are a valuable source of evidence. But I would take both claims (especially Alice’s) with a grain of salt, and if the strength of these claims were relevant for an important decision, I’d consider whether and to what extent these biases might be at play. This means giving a bit more weight to my own prior views of the institution and my own interpretations of the evidence, albeit only to the extent that I think biases like the above apply less to me than to the source of the claim.

New Comment
2 comments, sorted by Click to highlight new comments since:

Also worth considering is that how much an "institution" holds a view on average may not matter nearly as much as how the powerful decision makers within or above that institution feel.

Other reasons:

Biases towards claiming agreement with one’s own beliefs

If the institution is widely trusted, respected, high status, etc., as well as powerful, then if Alice convinces you that the institution supports her beliefs, then you might be inclined to give more credence to Alice's beliefs.  That would serve Alice's political agenda.

Weaker biases towards claiming disagreement with one’s own beliefs

If the institution is widely hated—for example al-Qaeda, the CIA, the KGB—or considered low status, crazy, and so on, then if Alice convinces you that the institution opposes her beliefs, that might make you more sympathetic to her, make you distrust arguments against her beliefs, and/or defuse preexisting arguments that support for Alice's position comes mostly from these evil/crazy institutions.

More from tlevin
Curated and popular this week