Orpheus16 - LessWrong

Comparing risk from internally-deployed AI to insider and outsider threats from humans

First, when I talk to security staff at AI companies about computer security, they often seem to fail to anticipate what insider threat from AIs will be like.

Why do you think this? Is it that they are not thinking about large numbers of automated agents running around doing a bunch of research?

Or is it that they are thinking about these kinds of scenarios, and yet they still don't apply the insider threat frame for some reason?

Ryan Kidd's Shortform

Orpheus1610d20

My understanding is that AGI policy is pretty wide open under Trump. I don't think he and most of his close advisors have entrenched views on the topic.

If AGI is developed in this Admin (or we approach it in this Admin), I suspect there is a lot of EV on the table for folks who are able to explain core concepts/threat models/arguments to Trump administration officials.

There are some promising signs of this so far. Publicly, Vance has engaged with AI2027. Non-publicly, I think there is a lot more engagement/curiosity than many readers might expect.

This isn't to say "everything is great and the USG is super on track to figure out AGI policy" but it's more to say "I think people should keep an open mind– even people who disagree with the Trump Admin on mainstream topics should remember that AGI policy is a weird/niche/new topic where lots of people do not have strong/entrenched/static positions (and even those who do have a position may change their mind as new events unfold.)"

Ryan Kidd's Shortform

Orpheus1612d95

There are definitely still benefits to doing alignment research, but this only justifies the idea that doing alignment research is better than doing nothing.

IMO the thing that matters (for an individual making decisions about what to do with their career) is something more like "on the margin, would it be better to have one additional person do AI governance or alignment/control?"

I happen to think that given the current allocation of talent, on-the-margin it's generally better for people to choose AI policy. (Particularly efforts to contribute technical expertise or technical understanding/awareness to governments, think-tanks interfacing with governments, etc.) There is a lot of demand in the policy community for these skills/perspectives and few people who can provide them. In contrast, technical expertise is much more common at the major AI companies (though perhaps some specific technical skills or perspectives on alignment are neglected.)

In other words, my stance is something like "by default, anon technical person would have more expected impact in AI policy unless they seem like an unusually good fit for alignment or an unusually bad fit for policy."

Orpheus16's Shortform

Orpheus1622d110

There's a video version of AI2027 that is quite engaging/accessible. Over 1.5M views so far.

Seems great. My main critique is that the "good ending" seems to assume alignment is rather easy to figure out, though admittedly that might be more of a critique of AI2027 itself rather than the way the video portrays it.

What We Learned from Briefing 70+ Lawmakers on the Threat from AI

Orpheus161mo314

This is fantastic work. There's also something about this post that feels deeply empathic and humble, in ways that are hard-to-articulate but seem important for (some forms of) effective policymaker engagement.

A few questions:

Are you planning to do any of this in the US?
What have your main policy proposals or "solutions" been? I think it's becoming a lot more common for me to encounter policymakers who understand the problem (at least a bit) and are more confused about what kinds of solutions/interventions/proposals are needed (both in the short-term and the long-term).
Can you say more about what kinds of questions you encounter when describing loss of control, as well as what kinds of answers have been most helpful? I'm increasingly of the belief that getting people to understand "AI has big risks" is less important than getting people to understand "some of the most significant risks come from this unique thing called loss of control that you basically don't really have to think about for other technologies, and this is one of the most critical ways in which AI is different than other major/dangerous/dual-use technologies."
Did you notice any major differences between parties? Did you change your approach based on whether you were talking to conservatives or labour? Did they have different perspectives or questions? (My own view is that people on the outside probably overestimate the extent to which there are partisan splits on these concerns-- they're so novel that I don't think the mainstream parties have really entrenched themselves in different positions. But would be curious if you disagree.)
- Sub-question: Was there any sort of backlash against Rishi Sunak's focus on existential risks? Or the UK AI Security Institute? In the US, it's somewhat common for Republicans to assume that things Biden did were bad (and for Democrats to assume that things Trump does is bad). Have you noticed anything similar?

We're Not Advertising Enough (Post 3 of 7 on AI Governance)

Orpheus161mo1212

I think we should be careful not to overestimate the success of AI2027. "Vance has engaged with your work" is an impressive feat, but it's still relatively far away from something like "Vance and others in the Admin have taken your work seriously enough to start to meaningfully change their actions or priorities based on it." (That bar is very high, but my impression is that the AI2027 folks would be like "yea, that's what would need to happen in order to steer toward meaningfully better futures.")

My impression is that AI2027 will have (even) more success if it is accompanied by an ambitious policymaker outreach effort (e.g., lots of 1-1 meetings with relevant policymakers and staffers, writing specific pieces of legislation or EOs and forming a coalition around those ideas, publishing short FAQ memos that address misconceptions or objections they are hearing in their meetings with policymakers, etc.)

This isn't to say that research is unnecessary-- much of the success of AI2027 comes from Daniel (and others on the team) having dedicated much of their lives to research and deep understanding. There are plenty of Government Relations people who are decent at "general policy engagement" but will fail to provide useful answers when staffers ask things like "But why won't we just code in the goals we want?", or "But don't you think the real thing here is about how quickly we diffuse the technology?", or "Why don't you think existing laws will work to prevent this?" or a whole host of other questions.

But on the margin, I would probably have Daniel/AI2027 spend more time on policymaker outreach and less time on additional research (especially now that AI2027 is done). There is some degree of influence one can have with the "write something that is thoroughly researched and hope it spreads organically" effort, and I think AI2027 has essentially saturated that. For additional influence, I expect it will be useful for Daniel (or other competent communicators on his team) to advance to "get really good at having meetings with the ~100-1000 most important people, understanding their worldviews, going back and forth with them, understanding their ideological or political constraints, and finding solutions/ideas/arguments that are tailored to these particular individuals." This is still a very intellectual task in some ways, but it involves a lot more "having meetings" and "forming models of social/political reality" than the classic "sit in your room with a whiteboard and understand technical reality" stuff that we typically associate with research.

Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies

Orpheus162mo328

Note that IFP (a DC-based think tank) recently had someone deliver 535 copies of their new book to every US Congressional office.

Note also that my impression is that DC people (even staffers) are much less "online" than tech audiences. Whether or not you copy IFP, I would suggest thinking about in-person distribution opportunities for DC.

RA x ControlAI video: What if AI just keeps getting smarter?

Orpheus162mo70

I think there are organizations that themselves would be more likely to be robustly trustworthy and would be more fine to link to

I would be curious for your thoughts on which organizations you feel are robustly trustworthy.

Bonus points for a list that is kind of a weighted sum of "robustly trustworthy" and "having a meaningful impact RE improving public/policymaker understanding". (Adding this in because I suspect that it's easier to maintain "robustly trustworthy" status if one simply chooses not to do a lot of externally-focused comms, so it's particularly impressive to have the combination of "doing lots of useful comms/policy work" and "managing to stay precise/accurate/trustworthy").

AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions

Orpheus162mo110

I appreciate the articulation and assessment of various strategies. My comment will focus on a specific angle that I notice both in the report and in the broader ecosystem:

I think there has been a conflating of “catastrophic risks” and “extinction/existential risks” recently, especially among groups that are trying to influence policy. This is somewhat understandable– the difference between "catastrophic" and "existential" is not that big of a deal in most people's minds. But in some contexts, I think it misses the fact that "existential [and thus by definition irreversible]" is actually a very different level of risk compared to "catastrophic [but something that we would be able to recover from.]"

This view seems to be (implicitly) expressed in the report summary, most notably the chart. It seems to me like the main frame is something like "if you want to avoid an unacceptable chance of catastrophic risk, all of these other options are bad."

But not all of these catastrophic risks are the same, I think this is actually quite an important consideration, and I think even (some) policymakers would/will see this as an essential consideration as AGI becomes more salient.

Specifically, "war" and "misuse" seem very different than "extinction" or "total and irreversible civilizational collapse."

"War" is broad enough to encompass many outcomes (ranging from "conflict with <1M deaths" to "nuclear conflict in which civilization recovers" all the way to "nuclear conflict in which civilization does not recover.") Note also that many natsec leaders already think the chance of a war between the US and China is at a level that would probably meet an intuitive bar for "unacceptable." (I don't have actual statistics on this but my guess is that >10% chance of war in the next decade is not an uncommon view. One plausible pathway that is discussed often is China invading Taiwan and US being committed to its defense).
"Misuse" can refer to many different kinds of events (including $1B in damages from a cyberattack, 10M deaths, 1B deaths, or complete human extinction.) These are, of course, very different in terms of their overall impact, even though all of them are intuitively/emotionally stored as "very bad things that we would ideally avoid."

It seems plausible to me that we will be in situations in which policymakers have to make tricky trade-offs between these different sources of risk, and my hope is that the community of people concerned about AI can distinguish between the different "levels" or "magnitudes" of different types of risks.

(My impression is that MIRI agrees with this, so this is more a comment on how the summary was presented & more a general note of caution to the ecosystem as a whole. I also suspect that the distinction between "catastrophic" and "existential/civilization-ending" will become increasingly more important as the AI conversation becomes more interlinked with the national security apparatus.)

Caveat: I have not read the full report and this comment is mostly inspired by the summary, the chart, and a general sense that many organizations other than MIRI are also engaging in this kind of conflation.

Alexander Gietelink Oldenziel's Shortform

Orpheus163mo40

I feel this way and generally think that on-the-margin we have too much forecasting and not enough “build plans for what to do if there is a sudden shift in political will” or “just directly engage with policymakers and help them understand things not via longform writing but via conversations/meetings.”

Many details will be ~impossible to predict and many details will not matter much (i.e., will not be action-relevant for the stakeholders who have the potential to meaningfully affect the current race to AGI).

That’s not to say forecasting is always unhelpful. Things like AI2027 can certainly move discussions forward and perhaps get new folks interested. But EG, my biggest critique of AI2027 is that I suspect they’re spending too much time/effort on detailed longform forecasting and too little effort on arranging meetings with Important Stakeholders, developing a strong presence in DC, forming policy recommendations, and related activities. (And TBC I respect/admire the AI2027 team, have relayed this feedback to them, and imagine they have thoughtful reasons for taking the approach they’re taking.)

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments

Sequences

Posts

Wikitag Contributions

Comments