Sequences

Leveling Up: advice & resources for junior alignment researchers

Wiki Contributions

Comments

Akash3h20

Right now, I think one of the most credible ways for a lab to show its committment to safety is through its engagement with governments.

I didn’t mean to imply that a lab should automatically be considered “bad” if its public advocacy and its private advocacy differ.

However, when assessing how “responsible” various actors are, I think investigating questions relating to their public comms, engagement with government, policy proposals, lobbying efforts, etc would be valuable.

If Lab A had slightly better internal governance but lab B had better effects on “government governance”, I would say that lab B is more “responsible” on net.

Akash7h60

@Zach Stein-Perlman, great work on this. I would be interested in you brainstorming some questions that have to do with the lab's stances toward (government) AI policy interventions.

After a quick 5 min brainstorm, here are some examples of things that seem relevant:

  • I remember hearing that OpenAI lobbied against the EU AI Act– what's up with that?
  • I heard a rumor that Congresspeople and their teams reached out to Sam/OpenAI after his testimony. They allegedly asked for OpenAI's help to craft legislation around licensing, and then OpenAI refused. Is that true? 
  • Sam said we might need an IAEA for AI at some point– what did he mean by this? At what point would he see that as valuable?
  • In general, what do labs think the US government should be doing? What proposals would they actively support or even help bring about? (Flagging ofc that there are concerns about actual and perceived regulatory capture, but there are also major advantages to having industry players support & contribute to meaningful regulation).
  • Senator Cory Booker recently asked Jack Clark something along the lines of "what is your top policy priority right now//what would you do if you were a Senator." Jack responded with something along the lines of "I would make sure the government can deploy AI successfully. We need a testing regime to better understand risks, but the main risk is that we don't use AI enough, and we need to make sure we stay at the cutting edge." What's up with that?
  • Why haven't Dario and Jack made public statements about specific government interventions? Do they believe that there are some circumstances under which a moratorium would need to be implemented, labs would need to be nationalized (or internationalized), or something else would need to occur to curb race dynamics? (This could be asked to any of the lab CEOs/policy team leads– I don't mean to be picking on Anthropic, though I think Sam/OpenAI have had more public statements here, and I think the other labs are scoring more poorly across the board//don't fully buy into the risks in the first place.)
  • Big tech is spending a lot of money on AI lobbying. How much is each lab spending (this is something you can estimate with publicly available data), and what are they actually lobbying for/against?

I imagine there's a lot more in this general category of "labs and how they are interacting with governments and how they are contributing to broader AI policy efforts", and I'd be excited to see AI Lab Watch (or just you) dive into this more.

Akash1d2818

I can imagine this growing into the default reference that people use when talking about whether labs are behaving responsibly.

I hope that this resource is used as a measure of relative responsibleness, and this doesn't get mixed up with absolute responsibleness. My understanding is that the resource essentially says, "here's some things that would be good– let's see how the labs compare on each dimension." The resource is not saying "if a lab gets a score above X% on each metric, then we are quite confident that the lab will not cause an existential catastrophe."

Moreover, my understanding is that the resource is not taking a position on whether or not it is "responsible"– in some absolute sense– for a lab to be scaling toward AGI in our current world. I see the resource as saying "conditional on a lab scaling toward AGI, are they doing so in a way that is relatively more/less responsible compared to the others that are scaling toward AGI."

This might be a pedantic point, but I think it's an important one to emphasize– a lab can score in 1st place and still present a risk to humanity that reasonable people would still deem unacceptable & irresponsible (or put differently, a lab can score in 1st place and still produce a catastrophe). 

Akash1d93

Agree with lots of this– a few misc thoughts [hastily written]:

  1. I think the Overton Window frame ends up getting people to focus too much on the dimension "how radical is my ask"– in practice, things are usually much more complicated than this. In my opinion, a preferable frame is something like "who is my target audience and what might they find helpful." If you're talking to someone who makes it clear that they will not support X, it's silly to keep on talking about X. But I think the "target audience first" approach ends up helping people reason in a more sophisticated way about what kinds of ideas are worth bringing up. As an example, in my experience so far, many policymakers are curious to learn more about intelligence explosion scenarios and misalignment scenarios (the more "radical" and "speculative" threat models). 
  2. I don't think it's clear that the more effective actors in DC tend to be those who look for small wins. Outside of the AIS community, there sure do seem to be a lot of successful organizations that take hard-line positions and (presumably) get a lot of their power/influence from the ideological purity that they possess & communicate. Whether or not these organizations end up having more or less influence than the more "centrist" groups is, in my view, not a settled question & probably varies a lot by domain. In AI safety in particular, I think my main claim is something like "pretty much no group– whether radical or centrist– has had tangible wins. When I look at the small set of tangible wins, it seems like the groups involved were across the spectrum of "reasonableness."
  3. The more I interact with policymakers, the more I'm updating toward something like "poisoning the well doesn't come from having radical beliefs– poisoning the well comes from lamer things like being dumb or uninformed, wasting peoples' time, not understanding how the political process works, not having tangible things you want someone to do, explaining ideas poorly, being rude or disrespectful, etc." I've asked ~20-40 policymakers (outside of the AIS bubble) things like "what sorts of things annoy you about meetings" or "what tends to make meetings feel like a waste of your time", and no one ever says "people come in with ideas that are too radical." The closest thing I've heard is people saying that they dislike it when groups fail to understand why things aren't able to happen (like, someone comes in thinking their idea is great, but then they fail to understand that their idea needs approval from committee A and appropriations person B and then they're upset about why things are moving slowly). It seems to me like many policy folks (especially staffers and exec branch subject experts) are genuinely interested in learning more about the beliefs and worldviews that have been prematurely labeled as "radical" or "unreasonable" (or perhaps such labels were appropriate before chatGPT but no longer are).
  4. A reminder that those who are opposed to regulation have strong incentives to make it seem like basically-any-regulation is radical/unreasonable. An extremely common tactic is for industry and its allies to make common-sense regulation seem radical/crazy/authoritarian & argue that actually the people proposing strong policies are just making everyone look bad & argue that actually we should all rally behind [insert thing that isn't a real policy.] (I admit this argument is a bit general, and indeed I've made it before, so I won't harp on it here. Also I don't think this is what Trevor is doing– it is indeed possible to raise serious discussions about "poisoning the well" even if one believes that the cultural and economic incentives disproportionately elevate such points).
  5. In the context of AI safety, it seems to me like the most high-influence Overton Window moves have been positive– and in fact I would go as far as to say strongly positive. Examples that come to mind include the CAIS statement, FLI pause letter, Hinton leaving Google, Bengio's writings/speeches about rogue AI & loss of control, Ian Hogarth's piece about the race to god-like AI, and even Yudkowsky's TIME article. 
  6. I think some of our judgments here depend on underlying threat models and an underlying sense of optimism vs. pessimism. If one things that labs making voluntary agreements/promises and NIST contributing to the development of voluntary standards are quite excellent ways to reduce AI risk, then the groups that have helped make this happen deserve a lot of credit. If one thinks that much more is needed to meaningfully reduce xrisk, then the groups that are raising awareness about the nature of the problem, making high-quality arguments about threat models, and advocating for stronger policies deserve a lot of credit.

I agree that more research on this could be useful. But I think it would be most valuable to focus less on "is X in the Overton Window" and more on "is X written/explained well and does it seem to have clear implications for the target stakeholders?" 

Akash8d128

I'm not sure who you've spoken to, but at least among the AI policy people who I talk to regularly (which admittedly is a subset of people who I think are doing the most thoughtful/serious work), I think nearly all of them have thought about ways in which regulation + regulatory capture could be net negative. At least to the point of being able to name the relatively "easy" ways (e.g., governments being worse at alignment than companies).

I continue to think people should be forming alliances with those who share similar policy objectives, rather than simply those who belong in the "I believe xrisk is a big deal" camp. I've seen many instances in which the "everyone who believes xrisk is a big deal belongs to the same camp" mentality has been used to dissuade people from communicating their beliefs, communicating with policymakers, brainstorming ideas that involve coordination with other groups in the world, disagreeing with the mainline views held by a few AIS leaders, etc.

The cultural pressures against policy advocacy have been so strong that it's not surprising to see folks say things like "perhaps our groups are no longer natural allies" now that some of the xrisk-concerned people are beginning to say things like "perhaps the government should have more of a say in how AGI development goes than in status quo, where the government has played ~0 role and ~all decisions have been made by private companies."

Perhaps there's a multiverse out there in which the AGI community ended up attracting govt natsec folks instead of Bay Area libertarians, and the cultural pressures are flipped. Perhaps in that world, the default cultural incentives pushed people heavily brainstorming ways that markets and companies could contribute meaningfully to the AGI discourse, and the default position for the "AI risk is a big deal" camp was "well obviously the government should be able to decide what happens and it would be ridiculous to get companies involved– don't be unilateralist by going and telling VCs about this stuff."

I bring up this (admittedly kinda weird) hypothetical to point out just how skewed the status quo is. One might generally be wary of government overinvolvement in regulating emerging technologies yet still recognize that some degree of regulation is useful, and that position would likely still push them to be in the "we need more regulation than we currently have" camp.

As a final note, I'll point out to readers less familiar with the AI policy world that serious people are proposing lots of regulation that is in between "status quo with virtually no regulation" and "full-on pause." Some of my personal favorite examples include: emergency preparedness (akin to the OPPR), licensing (see Romney), reporting requirements, mandatory technical standards enforced via regulators, and public-private partnerships.

Akash8d30

I'm interested in writing out somewhat detailed intelligence explosion scenarios. The goal would be to investigate what kinds of tools the US government would have to detect and intervene in the early stages of an intelligence explosion. 

If you know anyone who has thought about these kinds of questions, whether from the AI community or from the US government perspective, please feel free to reach out via LessWrong.

Akash14d8-3

To what extent would the organization be factoring in transformative AI timelines? It seems to me like the kinds of questions one would prioritize in a "normal period" look very different than the kinds of questions that one would prioritize if they place non-trivial probability on "AI may kill everyone in <10 years" or "AI may become better than humans on nearly all cognitive tasks in <10 years."

I ask partly because I personally would be more excited of a version of this that wasn't ignoring AGI timelines, but I think a version of this that's not ignoring AGI timelines would probably be quite different from the intellectual spirit/tradition of FHI.

More generally, perhaps it would be good for you to describe some ways in which you expect this to be different than FHI. I think the calling it the FHI of the West, the explicit statement that it would have the intellectual tradition of FHI, and the announcement right when FHI dissolves might make it seem like "I want to copy FHI" as opposed to "OK obviously I don't want to copy it entirely I just want to draw on some of its excellent intellectual/cultural components." If your vision is the latter, I'd find it helpful to see a list of things that you expect to be similar/different.)

Akash14d6436

I would strongly suggest considering hires who would be based in DC (or who would hop between DC and Berkeley). In my experience, being in DC (or being familiar with DC & having a network in DC) is extremely valuable for being able to shape policy discussions, know what kinds of research questions matter, know what kinds of things policymakers are paying attention to, etc.

I would go as far as to say something like "in 6 months, if MIRI's technical governance team has not achieved very much, one of my top 3 reasons for why MIRI failed would be that they did not engage enough with DC people//US policy people. As a result, they focused too much on questions that Bay Area people are interested in and too little on questions that Congressional offices and executive branch agencies are interested in. And relatedly, they didn't get enough feedback from DC people. And relatedly, even the good ideas they had didn't get communicated frequently enough or fast enough to relevant policymakers. And relatedly... etc etc."

I do understand this trades off against everyone being in the same place, which is a significant factor, but I think the cost is worth it. 

Akash14d82

I do think evaporative cooling is a concern, especially if everyone (or a very significant amount) of people left. But I think on the margin more people should be leaving to work in govt. 

I also suspect that a lot of systemic incentives will keep a greater-than-optimal proportion of safety-conscious people at labs as opposed to governments (labs pay more, labs are faster and have less bureaucracy, lab people are much more informed about AI, labs are more "cool/fun/fast-paced", lots of govt jobs force you to move locations, etc.)

I also think it depends on the specific lab– EG in light of the recent OpenAI departures, I suspect there's a stronger case for staying at OpenAI right now than for DeepMind or Anthropic. 

Akash14d151

Daniel Kokotajlo has quit OpenAI

I think now is a good time for people at labs to seriously consider quitting & getting involved in government/policy efforts.

I don't think everyone should leave labs (obviously). But I would probably hit a button that does something like "everyone at a lab governance team and many technical researchers spend at least 2 hours thinking/writing about alternative options they have & very seriously consider leaving."

My impression is that lab governance is much less tractable (lab folks have already thought a lot more about AGI) and less promising (competitive pressures are dominating) than government-focused work. 

I think governments still remain unsure about what to do, and there's a lot of potential for folks like Daniel K to have a meaningful role in shaping policy, helping natsec folks understand specific threat models, and raising awareness about the specific kinds of things governments need to do in order to mitigate risks.

There may be specific opportunities at labs that are very high-impact, but I think if someone at a lab is "not really sure if what they're doing is making a big difference", I would probably hit a button that allocates them toward government work or government-focused comms work.

Load More