Orpheus16's Shortform

Orpheus16

LESSWRONG
LW

Orpheus16's Shortform

by Orpheus16

18th Apr 2024

1 min read

101

7

This is a special post for quick takes by Orpheus16. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Mentioned in

24Whether governments will control AGI is important and neglected

Orpheus16's Shortform

23Alexander Gietelink Oldenziel

7Bogdan Ionut Cirstea

4Orpheus16

5Bogdan Ionut Cirstea

3ryan_greenblatt

4Orpheus16

5ryan_greenblatt

3Bogdan Ionut Cirstea

101 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:22 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]Orpheus161y5650

My current perspective is that criticism of AGI labs is an under-incentivized public good. I suspect there's a disproportionate amount of value that people could have by evaluating lab plans, publicly criticizing labs when they break commitments or make poor arguments, talking to journalists/policymakers about their concerns, etc.

Some quick thoughts:

Soft power– I think people underestimate the how strong the "soft power" of labs is, particularly in the Bay Area.
Jobs– A large fraction of people getting involved in AI safety are interested in the potential of working for a lab one day. There are some obvious reasons for this– lots of potential impact from being at the organizations literally building AGI, big salaries, lots of prestige, etc.
- People (IMO correctly) perceive that if they acquire a reputation for being critical of labs, their plans, or their leadership, they will essentially sacrifice the ability to work at the labs.
- So you get an equilibrium where the only people making (strong) criticisms of labs are those who have essentially chosen to forgo their potential of working there.
Money– The labs and Open Phil (which has been perceived, IMO correctly, as inve

... (read more)

[-]Zach Stein-Perlman1y*3026

Sorry for brevity, I'm busy right now.

Noticing good stuff labs do, not just criticizing them, is often helpful. I wish you thought of this work more as "evaluation" than "criticism."
It's often important for evaluation to be quite truth-tracking. Criticism isn't obviously good by default.

Edit:

3. I'm pretty sure OP likes good criticism of the labs; no comment on how OP is perceived. And I think I don't understand your "good judgment" point. Feedback I've gotten on AI Lab Watch from senior AI safety people has been overwhelmingly positive, and of course there's a selection effect in what I hear, but I'm quite sure most of them support such efforts.

4. Conjecture (not exclusively) has done things that frustrated me, including in dimensions like being "'unilateralist,' 'not serious,' and 'untrustworthy.'" I think most criticism of Conjecture-related advocacy is legitimate and not just because people are opposed to criticizing labs.

5. I do agree on "soft power" and some of "jobs." People often don't criticize the labs publicly because they're worried about negative effects on them, their org, or people associated with them.

[-]Orpheus161y*106

RE 1& 2:

Agreed— my main point here is that the marketplace of ideas undervalues criticism.

I think one perspective could be “we should all just aim to do objective truth-seeking”, and as stated I agree with it.

The main issue with that frame, imo, is that it’s very easy to forget that the epistemic environment can be tilted in favor of certain perspectives.

EG I think it can be useful for “objective truth-seeking efforts” to be aware of some of the culture/status games that underincentivize criticism of labs & amplify lab-friendly perspectives.

RE 3:

Good to hear that responses have been positive to lab watch. My impression is that this is a mix of: (a) lab watch doesn’t really threaten the interests of labs (especially Anthropic, which is currently winning & currently the favorite lab among senior AIS ppl), (b) the tides have been shifting somewhat and it is genuinely less taboo to criticize labs than a year ago, and (c) EAs respond more positively to criticism that feels more detailed/nuanced (look I have these 10 categories, let’s rate the labs on each dimension) than criticisms that are more about metastrategy (e.g., challenging the entire RSP frame or advocating for pol... (read more)

[-]Orpheus161y4420

I think now is a good time for people at labs to seriously consider quitting & getting involved in government/policy efforts.

I don't think everyone should leave labs (obviously). But I would probably hit a button that does something like "everyone at a lab governance team and many technical researchers spend at least 2 hours thinking/writing about alternative options they have & very seriously consider leaving."

My impression is that lab governance is much less tractable (lab folks have already thought a lot more about AGI) and less promising (competitive pressures are dominating) than government-focused work.

I think governments still remain unsure about what to do, and there's a lot of potential for folks like Daniel K to have a meaningful role in shaping policy, helping natsec folks understand specific threat models, and raising awareness about the specific kinds of things governments need to do in order to mitigate risks.

There may be specific opportunities at labs that are very high-impact, but I think if someone at a lab is "not really sure if what they're doing is making a big difference", I would probably hit a button that allocates them toward government work or government-focused comms work.

Written on a Slack channel in response to discussions about some folks leaving OpenAI.

[-]Alexander Gietelink Oldenziel1y2312

I'd be worried about evaporative cooling. It seems that the net result of this would be that labs would be almost completely devoid of people earnest about safety.

I agree with you government pathways to impact are most plausible and until recently undervalued. I also agree with you there are weird competitive pressures at labs.

8Orpheus161y

I do think evaporative cooling is a concern, especially if everyone (or a very significant amount) of people left. But I think on the margin more people should be leaving to work in govt. I also suspect that a lot of systemic incentives will keep a greater-than-optimal proportion of safety-conscious people at labs as opposed to governments (labs pay more, labs are faster and have less bureaucracy, lab people are much more informed about AI, labs are more "cool/fun/fast-paced", lots of govt jobs force you to move locations, etc.) I also think it depends on the specific lab– EG in light of the recent OpenAI departures, I suspect there's a stronger case for staying at OpenAI right now than for DeepMind or Anthropic.

[-]davekasten1y124

I largely agree, but think given government hiring timelines, there's no dishonor in staying at a lab doing moderately risk-reducing work until you get a hiring offer with an actual start date. This problem is less bad for the special hiring authorities being used for AI stuff oftentimes, but it's still not ideal.

[-]Orpheus168mo390

Suppose the US government pursued a "Manhattan Project for AGI". At its onset, it's primarily fuelled by a desire to beat China to AGI. However, there's some chance that its motivation shifts over time (e.g., if the government ends up thinking that misalignment risks are a big deal, its approach to AGI might change.)

Do you think this would be (a) better than the current situation, (b) worse than the current situation, or (c) it depends on XYZ factors?

[-]Orpheus168mo33-4

My own impression is that this would be an improvement over the status quo. Main reasons:

A lot of my P(doom) comes from race dynamics.
Right now, if a leading lab ends up realizing that misalignment risks are super concerning, they can't do much to end the race. Their main strategy would be to go to the USG.
If the USG runs the Manhattan Project (or there's some sort of soft nationalization in which the government ends up having a much stronger role), it's much easier for the USG to see that misalignment risks are concerning & to do something about it.
A national project would be more able to slow down and pursue various kinds of international agreements (the national project has more access to POTUS, DoD, NSC, Congress, etc.)
I expect the USG to be stricter on various security standards. It seems more likely to me that the USG would EG demand a lot of security requirements to prevent model weights or algorithmic insights from leaking to China. One of my major concerns is that people will want to pause at GPT-X but they won't feel able to because China stole access to GPT-Xminus1 (or maybe even a slightly weaker version of GPT-X).
In general, I feel like USG natsec folks are less

... (read more)

[-]davekasten8mo144

As you know, I have huge respect for USG natsec folks. But there are (at least!) two flavors of them: 1) the cautious, measure-twice-cut-once sort that have carefully managed deterrencefor decades, and 2) the "fuck you, I'm doing Iran-Contra" folks. Which do you expect will get in control of such a program ? It's not immediately clear to me which ones would.

4Orpheus168mo

@davekasten I know you posed this question to us, but I'll throw it back on you :) what's your best-guess answer? Or perhaps put differently: What do you think are the factors that typically influence whether the cautious folks or the non-cautious folks end up in charge? Are there any historical or recent examples of these camps fighting for power over an important operation?

3O O8mo

Why is the built-in assumption for almost every single post on this site that alignment is impossible and we need a 100 year international ban to survive? This does not seem particularly intellectually honest to me. It is very possible no international agreement is needed. Alignment may turn out to be quite tractable.

[-]Vladimir_Nesov8mo156

A mere 5% chance that the plane will crash during your flight is consistent with considering this extremely concerning and doing anything in your power to avoid getting on it. "Alignment is impossible" is not necessary for great concern, isn't implied by great concern.

[-]Richard_Ngo8mo15-7

I don't think this line of argument is a good one. If there's a 5% chance of x-risk and, say, a 50% chance that AGI makes the world just generally be very chaotic and high-stakes over the next few decades, then it seems very plausible that you should mostly be optimizing for making the 50% go well rather than the 5%.

[-]Vladimir_Nesov8mo175

Still consistent with great concern. I'm pointing out that O O's point isn't locally valid, observing concern shouldn't translate into observing belief that alignment is impossible.

[-]MondSemmel8mo128

Yudkowsky has a pinned tweet that states the problem quite well: it's not so much that alignment is necessarily infinitely difficult, but that it certainly doesn't seem anywhere as easy as advancing capabilities, and that's a problem when what matters is whether the first powerful AI is aligned:

Safely aligning a powerful AI will be said to be 'difficult' if that work takes two years longer or 50% more serial time, whichever is less, compared to the work of building a powerful AI without trying to safely align it.

4Orpheus168mo

Another frame: If alignment turns out to be easy, then the default trajectory seems fine (at least from an alignment POV. You might still be worried about EG concentration of power). If alignment turns out to be hard, then the policy decisions we make to affect the default trajectory matter a lot more. This means that even if misalignment risks are relatively low, a lot of value still comes from thinking about worlds where misalignment is hard (or perhaps "somewhat hard but not intractably hard").

7Seth Herd8mo

It's not every post, but there are still a lot of people who think that alignment is very hard. The more common assumption is that we should assume that alignment isn't trivial, because an intellectually honest assessment of the range of opinions suggests that we collectively do not yet know how hard alignment will be.

[-]habryka8mo207

If the project was fueled by a desire to beat China, the structure of the Manhattan project seems unlikely to resemble the parts of the structure of the Manhattan project that seemed maybe advantageous here, like having a single government-controlled centralized R&D effort.

My guess is if something like this actually happens, it would involve a large number of industry subsidies, and would create strong institutional momentum that even when things got dangerous, to push the state of the art forward, and in as much as there is pushback, continue dangerous development in secret.

In the case of nuclear weapons the U.S. really went very far under the advisement of Edward Teller, so I think the outside view here really doesn't look good:

4Orpheus168mo

Good points. Suppose you were on a USG taskforce that had concluded they wanted to go with the "subsidy model", but they were willing to ask for certain concessions from industry. Are there any concessions/arrangements that you would advocate for? Are there any ways to do the "subsidy model" well, or do you think the model is destined to fail even if there were a lot of flexibility RE how to implement it?

8habryka8mo

I think "full visibility" seems like the obvious thing to ask for, and something that could maybe improve things. Also, preventing you from selling your products to the public, and basically forcing you to sell your most powerful models only to the government, gives the government more ability to stop things when it comes to it. I will think more about this, I don't have any immediate great ideas.

4Orpheus168mo

If you could only have "partial visibility", what are some of the things you would most want the government to be able to know?

2Nathan Helm-Burger8mo

I have an answer to that: making sure that NIST:AISI had at least scores of automated evals for checkpoints of any new large training runs, as well as pre-deployment eval access. Seems like a pretty low-cost, high-value ask to me. Even if that info leaked from AISI, it wouldn't give away corporate algorithmic secrets. A higher cost ask, but still fairly reasonable, is pre-deployment evals which require fine-tuning. You can't have a good sense of a what the model would be capable of in the hands of bad actors if you don't test fine-tuning it on hazardous info.

9Richard_Ngo8mo

Worse than the current situation, because the counterfactual is that some later project happens which kicks off in a less race-y manner. In other words, whatever the chance of its motivation shifting over time, it seems dominated by the chance that starting the equivalent project later would just have better motivations from the outset.

4Orpheus168mo

Can you say more about scenarios where you envision a later project happening that has different motivations? I think in the current zeitgeist, such a project would almost definitely be primarily motivated by beating China. It doesn't seem clear to me that it's good to wait for a new zeitgeist. Reasons: * A company might develop AGI (or an AI system that is very good at AI R&D that can get to AGI) before a major zeitgeist change. * The longer we wait, the more capable the "most capable model that wasn't secured" is. So we could risk getting into a scenario where people want to pause but since China and the US both have GPT-Nminus1, both sides feel compelled to race forward (whereas this wouldn't have happened if security had kicked off sooner.)

5Joe Collman8mo

Some thoughts: * The correct answer is clearly (c) - it depends on a bunch of factors. * My current guess is that it would make things worse (given likely values for the bunch of other factors) - basically for Richard's reasons. * Given [new potential-to-shift-motivation information/understanding], I expect there's a much higher chance that this substantially changes the direction of a not-yet-formed project, than a project already in motion. * Specifically: * Who gets picked to run such a project? If it's primarily a [let's beat China!] project, are the key people cautious and highly adaptable when it comes to top-level goals? Do they appoint deputies who're cautious and highly adaptable? * Here I note that the kind of 'caution' we'd need is [people who push effectively for the system to operate with caution]. Most people who want caution are more cautious. * How is the project structured? Will the structure be optimized for adaptability? For red-teaming of top-level goals? * Suppose that a mid-to-high-level participant receives information making the current top-level goals questionable - is the setup likely to reward them for pushing for changes? (noting that these are the kind of changes that were not expected to be needed when the project launched) * Which external advisors do leaders of the project develop relationships with? What would trigger these to change? * ... * I do think that it makes sense to aim for some centralized project - but only if it's the right kind. * I expect that almost all the directional influence is in [influence the initial conditions]. * For this reason, I expect [push for some kind of centralized project, and hope it changes later] is a bad idea. * I think [devote great effort to influencing the likely initial direction of any such future project] seems a great idea (so long as you're sufficiently enlightened about desirable initial directions, of course :)) * I'd note that [in

5tlevin8mo

Depends on the direction/magnitude of the shift! I'm currently feeling very uncertain about the relative costs and benefits of centralization in general. I used to be more into the idea of a national project that centralized domestic projects and thus reduced domestic racing dynamics (and arguably better aligned incentives), but now I'm nervous about the secrecy that would likely entail, and think it's less clear that a non-centralized situation inevitably leads to a decisive strategic advantage for the leading project. Which is to say, even under pretty optimistic assumptions about how much such a project invests in alignment, security, and benefit-sharing, I'm pretty uncertain that this would be good, and with more realistic assumptions I probably lean towards it being bad. But it super depends on the governance, the wider context, how a "Manhattan Project" would affect domestic companies and China's policymaking, etc. (I think a great start would be not naming it after the Manhattan Project, though. It seems path dependent, and that's not a great first step.)

4Orpheus168mo

Can you say more about what has contributed to this update?

4Seth Herd8mo

One factor is different incentives for decision-makers. The incentives (and the mindset) for tech companies is to move fast and break things. The incentives (and mindset) for government workers is usually vastly more conservative. So if it is the government making decisions about when to test and deploy new systems, I think we're probably far better off WRT caution. That must be weighed against the government typically being very bad at technical matters. So even an attempt to be cautious could be thwarted by lack of technical understanding of risks. Of course, the Trump administration is attempting to instill a vastly different mindset, more like tech companies. So if it's that administration we're talking about, we're probably worse off on net with a combination of lack of knowledge and YOLO attitudes. Which is unfortunate - because this is likely to happen anyway. As Habryka and others have noted, it also depends on whether it reduces race dynamics by aggregating efforts across companies, or mostly just throws funding fuel on the race fire.

4davekasten8mo

I think this is a (c) leaning (b), especially given that we're doing it in public. Remember, the Manhattan Project was a highly-classified effort and we know it by an innocuous name given to it to avoid attention. Saying publicly, "yo, China, we view this as an all-costs priority, hbu" is a great way to trigger a race with China... But if it turned out that we knew from ironclad intel with perfect sourcing that China was already racing (I don't expect this to be the case), then I would lean back more towards (c).

4Orpheus168mo

@davekasten @Zvi @habryka @Rob Bensinger @ryan_greenblatt @Buck @tlevin @Richard_Ngo @Daniel Kokotajlo I suspect you might have interesting thoughts on this. (Feel free to ignore though.)

6Daniel Kokotajlo8mo

(c). Like if this actually results in them behaving responsibly later, then it was all worth it.

5Orpheus168mo

What do you think are the most important factors for determining if it results in them behaving responsibly later? For instance, if you were in charge of designing the AI Manhattan Project, are there certain things you would do to try to increase the probability that it leads to the USG "behaving more responsibly later?"

3Sohaib Imran8mo

One thing I’d be bearish on is visibility into the latest methods being used for frontier AI methods, which would downstream reduce the relevance of alignment research except for the research within the manhattan-like project itself. This is already somewhat true of the big labs eg. methods used for o1 like models. However, there is still some visibility in the form of system cards and reports which hint at the methods. When the primary intention is racing ahead of China, I doubt there will be reports discussing methods used for frontier systems.

3worse8mo

Something I'm worried about now is some RFK Jr/Dr. Oz equivalent being picked to lead on AI...

1MiloSal6mo

I'm fairly confident that this would be better than the current situation, and primarily because of something that others haven't touched on here. The reason is that, regardless of who develops them, the first (militarily and economically) transformative AIs will cause extreme geopolitical tension and instability that is challenging to resolve safely. Resolving such a situation safely requires a well-planned off-ramp, which must route through extremely major national- or international-level decisions. Only governments are equipped to make decisions like these; private AGI companies certainly are not. Therefore, unless development is at some point centralized in a USG project, there is no way to avoid the many paths to catastrophe that threaten the world during the period of extreme tension coinciding with AGI/ASI development.

[-]Orpheus1611mo3524

Here are some AI governance/policy thoughts that I've found myself articulating at least 3 times over the last month or so:

I think people interested in AI governance/policy should divide their projects into "things that could be useful in the current Overton Window" and "things that would require a moderate or major Overton Window shift to be useful." I think sometimes people end up not thinking concretely about which world they're aiming for, and this makes their work less valuable.
- If you're aiming for the current Overton Window, you need to be brutally honest about what you can actually achieve. There are many barriers to implementing sensible-seeming ideas. You need access to stakeholders who can do something. You should try to fail quickly. If your idea requires buy-in from XYZ folks, and X isn't interested, that's worth figuring out ASAP.
- If you're aiming for something outside the current Overton Window, you often have a lot of room to be imaginative. I think it's very easy to underestimate Overton Window shifts. If policymakers get considerably more concerned about AI risks, there are a lot of things that will "on the table". People say that AI safety folks were unprepar

... (read more)

5davekasten11mo

I think I agree with much-to-all of this. One further amplification I'd make about the last point: the culture of DC policymaking is one where people are expected to be quick studies and it's OK to be new to a topic; talent is much more funged from topic to topic in response to changing priorities than you'd expect. Your Lesswrong-informed outside view of how much you need to know on a topic to start commenting on policy ideas is probably wrong. (Yes, I know, someone is about to say "but what if you are WRONG about the big idea given weird corner case X or second-order effects Y?" Look, reversed stupidity is not wisdom, but also also sometimes you can just quickly identify stupid-across-almost-all-possible-worlds ideas and convince people just not to do them rather than having to advocate for an explicit good-idea alternative.)

1yanni kyriacos11mo

I think how delicately you treat your personal Overton Window should also depend on your timelines.

[-]Orpheus1610mo222

Recent Senate hearing includes testimony from Helen Toner and William Saunders.

Both statements are explicit about AGI risks & emphasize the importance of transparency & whistleblower mechanisms.
William's statement acknowledges that he and others doubt that OpenAI's safety work will be sufficient.
- "OpenAI will say that they are improving. I and other employees who resigned doubt they will be ready in time. This is true not just with OpenAI; the incentives to prioritize rapid development apply to the entire industry. This is why a policy response is needed."
Helen's statement provides an interesting paragraph about China at the end.
- "A closing note on China: The specter of ceding U.S. technological leadership to China is often treated as a knock-down argument against implementing regulations of any kind. Based on my research on the Chinese AI ecosystem and U.S.-China technology competition more broadly, I think this argument is not nearly as strong as it seems at first glance. We should certainly be mindful of how regulation can affect the pace of innovation at home, and keep a close eye on how our competitors and adversaries are developing and using AI. But looking

... (read more)

9Ben Pace10mo

I am impressed regarding Helen Toner's China comment! For a while I have been tracking a hypothesis that nobody working in DC in AI Policy would openly and prominently speak against competition with China being a current priority, but this quote shows that hypothesis does not hold. Now I will track whether any such person explicitly states that it doesn't matter who gets there first, civilization will most likely end regardless, and that competition shouldn't be a priority even if China were ahead of the US. I haven't seen a prominent instance of this happening yet.

[-]gwern10mo114

Toner is one of the only people criticizing the China arms race claims, like last year: https://www.foreignaffairs.com/china/illusion-chinas-ai-prowess-regulation-helen-toner This also earned her some enmity on social media as a Commie stooge last year.

6Ben Pace10mo

Appreciate the link (and for others, here's an archived version without the paywall.) I update toward a model of Helen's statements here not being very representative of what people in DC feel comfortable saying aloud, though to me it's still nice to know that literally anyone is able to say these words.

[-]davekasten10mo125

Generally, it is difficult to understate how completely the PRC is seen as a bad-faith actor in DC these days. Many folks saw them engage in mass economic espionage for a decade while repeatedly promising to stop; those folks are now more senior in their careers than those formative moments. Then COVID happened, and while not everyone believes in the lab leak hypothesis, basically everyone believes that the PRC sure as heck reflexively covered up whether or not they were actually culpable.

(Edit: to be clear, reporting, not endorsing, these claims)

5Ben Pace10mo

Thanks for the info. This is an area where I expect a lot of my info sources to be pretty adversarial, and furthermore I haven't looked into these issues a great deal, so I don't have a developed perspective on how bad-faith the Chinese government's agreements and information sources are. I think I recall pretty adversarial information-sharing behavior from China toward the rest of the world in March 2020 (which I consider a massive deal), though I'd have to re-read Wikipedia and LessWrong to recall what exactly was going on.

[-]Orpheus161y2212

I'm surprised why some people are so interested in the idea of liability for extreme harms. I understand that from a legal/philosophical perspective, there are some nice arguments about how companies should have to internalize the externalities of their actions etc.

But in practice, I'd be fairly surprised if liability approaches were actually able to provide a meaningful incentive shift for frontier AI developers. My impression is that frontier AI developers already have fairly strong incentives to avoid catastrophes (e.g., it would be horrible for Microsoft if its AI model caused $1B in harms, it would be horrible for Meta and the entire OS movement if an OS model was able to cause $1B in damages.)

And my impression is that most forms of liability would not affect this cost-benefit tradeoff by very much. This is especially true if the liability is only implemented post-catastrophe. Extreme forms of liability could require insurance, but this essentially feels like a roundabout and less effective way of implementing some form of licensing (you have to convince us that risks are below an acceptable threshold to proceed.)

I think liability also has the "added" problem of being quite un... (read more)

[-]habryka1y204

One reason I feel interested in liability is because it opens up a way to do legal investigations. The legal system has a huge number of privileges that you get to use if you have reasonable suspicion someone has committed a crime or is being negligent. I think it's quite likely that if there was no direct liability, that even if Microsoft or OpenAI causes some huge catastrophe, that we would never get a proper postmortem or analysis of the facts, and would never reach high-confidence on the actual root-causes.

So while I agree that OpenAI and Microsoft want to of course already avoid being seen as responsible for a large catastrophe, having legal liability makes it much more likely there will be an actual investigation where e.g. the legal system gets to confiscate servers and messages to analyze what happens, which makes it then more likely that if OpenAI and Microsoft are responsible, they will be found out to be responsible.

3Orpheus161y

I found this answer helpful and persuasive– thank you!

3[anonymous]1y

I think liability-based interventions are substantially more popular with Republicans than other regulatory interventions - they're substantially more hands-off than, for instance, a regulatory agency. They also feature prominently in the Josh Hawley proposal. I've also been told by a republican staffer that liability approaches are relatively popular amongst Rs. An important baseline point is that AI firms (if they're selling to consumers) are probably by default covered by product liability by default. If they're covered by product liability, then they'll be liable for damages if it can be shown that there was a not excessively costly alternative design that they could have implemented that would have avoided that harm. If AI firms aren't covered by product liability, they're liable according to standard tort law, which means they're liable if they're negligent under a reasonable person standard. Liability law also gives (some, limited) teeth to NIST standards. If a firm can show that it was following NIST safety standards, this gives it a strong argument that it wasn't being negligent. I share your scepticism of liability interventions as mechanisms for making important dents in the AI safety problem. Prior to the creation of the EPA, firms were still in principle liable for the harms their pollution caused, but the tort law system is generically a very messy way to get firms to reduce accident risks. It's expensive and time consuming to go through the court system, courts are reluctant to award punitive damages which means that externalities aren't internalised even theory (in expectation for firms,) and you need to find a plaintiff with standing to sue firms. I think there are still some potentially important use cases for liability for reducing AI risks: * Making clear the legal responsibilities of private sector auditors (I'm quite confident that this is a good idea) * Individual liability for individuals with safety responsibilities at firms (a

2Chris_Leong1y

I think we should be talking more about potentially denying a frontier AI license to any company that causes a major disaster (within some future licensing regime), where a company’s record before the law passes will be taken into amount.

1L Rudolf L1y

One alternative method to liability for the AI companies is strong liability for companies using AI systems. This does not directly address risks from frontier labs having dangerous AIs in-house, but helps with risks from AI system deployment in the real world. It indirectly affects labs, because they want to sell their AIs. A lot of this is the default. For example, Air Canada recently lost a court case after claiming a chatbot promising a refund wasn't binding on them. However, there could be related opportunities. Companies using AI systems currently don't have particularly good ways to assess risks from AI deployment, and if models continue getting more capable while reliability continues lagging, they are likely to be willing to pay an increasing amount for ways to get information on concrete risks, guard against it, or derisk it (e.g. through insurance against their deployed AI systems causing harms). I can imagine a service that sells AI-using companies insurance against certain types of deployment risk, that could also double as a consultancy / incentive-provider for lower-risk deployments. I'd be interested to chat if anyone is thinking along similar lines.

1RedMan1y

There are analogies here in pollution. Some countries force industry to post bonds for damage to the local environment. This is a new innovation that may be working. The reason the superfund exists in the US is because liability for pollution can be so severe that a company would simply cease to operate, and the mess would not be cleaned up. In practice, when it comes to taking environmental risks, better to burn the train cars of vinyl chloride, creating a catastrophe too expensive for anyone to clean up or even comprehend than to allow a few gallons to leak, creating an expensive accident that you can actually afford.

[-]Orpheus161y216

New Vox article criticizes Anthropic for trying to weaken SB1047 (as well as for some other things). Some notable sections:

Anthropic is lobbying to water down the bill. It wants to scrap the idea that the government should enforce safety standards before a catastrophe occurs. “Instead of deciding what measures companies should take to prevent catastrophes (which are still hypothetical and where the ecosystem is still iterating to determine best practices)” the company urges, “focus the bill on holding companies responsible for causing actual catastrophes.” In other words, take no action until something has already gone terribly wrong.
“Anthropic is trying to gut the proposed state regulator and prevent enforcement until after a catastrophe has occurred — that’s like banning the FDA from requiring clinical trials,” Max Tegmark, president of the Future of Life Institute, told me.
In what he called “a cynical procedural move,” Tegmark noted that Anthropic has also introduced amendments to the bill that touch on the remit of every committee in the legislature, thereby giving each committee another opportunity to kill it. “This is straight out of Big Tech’s playbook,” he said
The US

... (read more)

[-]Zach Stein-Perlman1y*4022

This article makes some fine points but some misleading ones and its thesis is wrong, I think. Bottom line: Anthropic does lots of good things and is doing much better than being maximally selfish/ruthless. (And of course this is possible, contra the article — Anthropic is led by humans who have various beliefs which may entail that they should make tradeoffs in favor of safety. The space of AI companies is clearly not so perfectly competitive that anyone who makes tradeoffs in favor of safety becomes bankrupt and irrelevant.)

It’s pushing back on a landmark California bill to regulate AI.

Yep, Anthropic's policy advocacy seems bad.

It’s taking money from Google and Amazon in a way that’s drawing antitrust scrutiny. And it’s being accused of aggressively scraping data from websites without permission, harming their performance.

My impression is that these are not big issues. I'm open to hearing counterarguments. [Edit: the scraping is likely a substantial issue for many sites; see comment below. (It is not an x-safety issue, of course.)]

Here’s another tension at the heart of AI development: Companies need to hoover up reams and reams of high-quality text from books and websites in

... (read more)

[-]habryka1y241

My impression is that these are not big issues. I'm open to hearing counterarguments.

I think the Anthropic scraper has been causing a non-trivial amount of problems for LW. I am kind of confused because there might be scrapers going around that are falsely under the name "claudebot" but in as much as it is Anthropic, it sure has been annoying (like, killed multiple servers and has caused me like 10+ hours of headaches).

The part of the article I actually found most interesting is this:

In what he called “a cynical procedural move,” Tegmark noted that Anthropic has also introduced amendments to the bill that touch on the remit of every committee in the legislature, thereby giving each committee another opportunity to kill it.

This seems worth looking into and would be pretty bad.

2mesaoptimizer1y

I hope you've at least throttled them or IP blocked them temporarily for being annoying. It is not that difficult to scrape a website while respecting its bandwidth and CPU limitations.

7habryka1y

We complained to them and it's been better in recent months. We didn't want to block them because I do actually want LW to be part of the training set.

2Orpheus161y

+1 to lots of this. (Meta: Me posting the article is not an endorsement of the article as a whole. I agree with Zach that lots of sections of it don't seem fair/balanced and don't seem to be critical from an extreme risk perspective. I think the bullet points I listed above summarize the parts that I think are important/relevant.)

[-]aog1y1310

I think there's a decent case that SB 1047 would improve Anthropic's business prospects, so I'm not sure this narrative makes sense. On one hand, SB 1047 might make it less profitable to run an AGI company, which is bad for Anthropic's business plan. But Anthropic is perhaps the best positioned of all AGI companies to comply with the requirements of SB 1047, and might benefit significantly from their competitors being hampered by the law.

The good faith interpretation of Anthropic's argument would be that the new agency created by the bill might be very bad at issuing guidance that actually reduces x-risk, and you might prefer the decision-making of AI labs with a financial incentive to avoid catastrophes without additional pressure to follow the exact recommendations of the new agency.

4Orpheus161y

Some quick thoughts on this: * If SB1047 passes, labs can still do whatever they want to reduce xrisk. This seems additive to me– I would be surprised if a lab was like "we think XYZ is useful to reduce extreme risks, and we would've done them if SB1047 had not passed, but since Y and Z aren't in the FMD guidance, we're going to stop doing Y and Z." * I think the guidance the agency issues will largely be determined by who it employs. I think it's valid to be like "maybe the FMD will just fail to do a good job because it won't employ good people", but to me this is more of a reason to say "how do we make sure the FMD gets staffed with good people who understand how to issue good recommendations", rather than "there is a risk that you issue bad guidance, therefore we don't want any guidance." * I do think that a poorly-implemented FMD could cause harm by diverting company attention/resources toward things that are not productive, but IMO this cost seems relatively small compared to the benefits acquired in the worlds where the FMD issues useful guidance. (I haven't done a quantitative EV calculation on this though, maybe someone should. I would suspect that even if you give FMD like 20-40% chance of good guidance, and 60-80% chance of useless guidance, the EV would still be net positive.)

[-]Orpheus1611mo174

Why didn't industry succeed in killing SB1047 [so far]?

If someone had told me in 2022 that there would be a bill in CA that the major labs opposed and that the tech industry spent a fair amount of effort lobbying against (to the point of getting Congresspeople and Nancy Pelosi to chime in), I would've been like "that bill seems like it should get killed pretty early on in the process."

Like, if the bill has to go through 5+ committees, I would've predicted that it would die within the first 3 committees.So what's going on? Some plausible explanations:

Industry has less power over AI legislation than I (and maybe some others) thought
Industry has more influence on the federal government than on the CA legislatures
Industry underestimated SB1047 early on//didn't pay much attention to it and the opposition came relatively late in the game
Scott Weiner is really good at building coalitions and forming alliances
SB1047 is relatively light-touch and the burden is very high when industry tries to fight light-touch things

What do you think are the most noteworthy explanations for why industry has failed to kill SB1047 so far?

8davekasten11mo

One question I have is whether Nancy Pelosi was asked and agreed to do this, or whether Nancy Pelosi identified this proactively as an opportunity to try to win back some tech folks to the Dem side. Substantially changes our estimate of how much influence the labs have in this conversation.

7Orpheus1611mo

One plausible explanation is that industry still thinks it's likely to kill the bill, and they just didn't feel like they needed to play their cards sooner. But this still leaves me surprised– I would've expected that it's in industry's interest to kill the bill earlier in the process because: 1. It might be easier to kill earlier on because it hasn't gained much traction/support 2. If you want to appear like you're open to regulation (which seems to be the policy of major AI companies), you probably want to kill it in a relatively silent/invisible way. If you have to be very loud and public and you get to the point where there are a bunch of media articles about it, you lose some credibility/reputation/alliances (and indeed I do think industry has lost some of this "plausibility of good will" as a result of the SB1047 saga)

[-]Orpheus161y167

My rough ranking of different ways superintelligence could be developed:

Least safe: Corporate Race. Superintelligence is developed in the context of a corporate race between OpenAI, Microsoft, Google, Anthropic, and Facebook.
Safer (but still quite dangerous): USG race with China. Superintelligence is developed in the context of a USG project or "USG + Western allies" project with highly secure weights. The coalition hopefully obtains a lead of 1-3 years that it tries to use to align superintelligence and achieve a decisive strategic advantage. This probably relies heavily on deep learning and means we do not have time to invest into alternative paradigms ("provably safe" systems, human intelligence enhancement, etc.
Safest (but still not a guarantee of success): International coalition. Superintelligence is developed in the context of an international project with highly secure weights. The coalition still needs to develop superintelligence before rogue projects can, but the coalition hopes to obtain a lead of 10+ years that it can use to align a system that can prevent rogue AGI projects. This could buy us enough time to invest heavily in alternative paradigms.

My own th... (read more)

7Bogdan Ionut Cirstea1y

I don't think the risk ordering is obvious at all, especially not between #2 and #3, and especially not if you also took into account tractability concerns and risks separate from extinction (e.g. stable totalitarianism, s-risks). Even if you thought coordinating with China might be worth it, I think it should be at least somewhat obvious why the US government [/ and its allies] might be very uncomfortable building a coalition with, say, North Korea or Russia. Even between #1 and #2, the probable increase in risks of centralization might make it not worth it, at least in some worlds, depending on how optimistic one might be about e.g. alignment or offense-defense balance from misuse of models with dangerous capabilities. I also don't think it's obvious alternative paradigms would necessarily be both safer and tractable enough, even on 10-year timelines, especially if you don't use AI automation (using the current paradigm, probably) to push those forward.

4Orpheus161y

Can you say more about why the risk of centralization differs meaningfully between the three worlds? IMO if you assume that (a) an intelligence explosion occurs at some point, (b) the leading actor uses the intelligence explosion to produce a superintelligence that provides a decisive strategic advantage, and (c) the superintelligence is aligned/controlled... Then you are very likely (in the absence of coordination) to result in centralization no matter what. It's just a matter of whether OpenAI/Microsoft (scenario #1), the USG and allies (scenario #2), or a broader international coalition (weighted heavily toward the USG and China) are the ones wielding the superintelligence. (If anything, it seems like the "international coalition" approach seems less likely to lead to centralization than the other two approaches, since you're more likely to get post-AGI coordination.) In my vision, the national or international project would be investing into "superalignment"-style approaches, they would just (hopefully) have enough time/resources to be investing into other approaches as well. I typically assume we don't get "infinite time"– i.e., even the international coalition is racing against "the clock" (e.g., the amount of time it takes for a rogue actor to develop ASI in a way that can't be prevented, or the amount of time we have until a separate existential catastrophe occurs.) So I think it would be unwise for the international coalition to completely abandon DL/superalignemnt, even if one of the big hopes is that a safer paradigm would be discovered in time.

5Bogdan Ionut Cirstea1y

I don't think this is obvious, stably-multipolar worlds seem at least plausible to me.

3ryan_greenblatt1y

See also here and here.

4Orpheus161y

@Bodgan, Can you spell out a vision for a stably multipolar world with the above assumptions satisfied? IMO assumption B is doing a lot of the work— you might argue that the IE will not give anyone a DSA, in which case things get more complicated. I do see some plausible stories in which this could happen but they seem pretty unlikely. @Ryan, thanks for linking to those. Lmk if there are particular points you think are most relevant (meta: I think in general I find discourse more productive when it’s like “hey here’s a claim, also read more here” as opposed to links. Ofc that puts more communication burden on you though, so feel free to just take the links approach.)

5ryan_greenblatt1y

(Yeah, I was just literally linking to things people might find relevant to read without making any particular claim. I think this is often slightly helpful, so I do it. Edit: when I do this, I should probably include a disclaimer like "Linking for relevance, not making any specific claim".)

3Bogdan Ionut Cirstea1y

Yup, I was thinking about worlds in which there is no obvious DSA, or where the parties involved are risk averse enough (perhaps e.g. for reasons like in this talk)

2Nathan Helm-Burger1y

My expectation is that DSI can (and will) be achieved before ASI. In fact, I expect ASI to be about as useful as a bomb which has a minimum effect size of destroying the entire solar system if deployed. In other words, useful only for Mutually Assured Destruction. DSI only requires a nuclear-armed state actor to have an effective global missile defense system. Whichever nuclear-armed state actor gets that without any other group having that can effectively demand the surrender and disarmament of all other nations. Including confiscating their compute resources. Do you think missile defense is so difficult that only ASI can manage it? I don't. That seems like a technical discussion which would need more details to hash out. I'm pretty sure an explicitly designed tool AI and a large drone and satellite fleet could accomplish that.

3Dagon1y

Competition is fractal. There are multiple hierarchies (countries/departments/agencies/etc, corporations/divisions/teams/etc), with individual humans acting on their own behalf. Often, individuals have influence and goals in multiple hierarchies. Your 1/2/3 delineation is not the important part. It’s going to be all 3, with chaotic shifts as public perception, funding, and regulation shifts around.

3davekasten1y

Agree -- I think people need to be prepared for "try-or-die" scenarios. One unfun one I'll toss into the list: "Company A is 12 months from building Cthulhu, and governments truly do not care and there is extremely strong reason to believe that will not change in the next year. All our policy efforts have failed, our existing technical methods are useless, and the end of the world has come. Everyone report for duty at Company B, we're going to try to roll the hard six."

2mesaoptimizer1y

If Company A is 12 months from building Cthulhu, we fucked up upstream. Also, I don't understand why you'd want to play the AI arms race -- you have better options. They expect an AI arms race. Use other tactics. Get into their OODA loop. Unsee the frontier lab.

4davekasten1y

...yes ? I think my scenario explicitly assumes that we've fucked up upstream in many, many ways.

4mesaoptimizer1y

Oh, by that I meant something like "yeah I really think it is not a good idea to focus on an AI arms race". See also Slack matters more than any other outcome.

2Oscar1y

You are probably already familiar with this, but re option 3, the Multilateral AGI Consortium (MAGIC) proposal is I assume along the lines of what you are thinking.

3davekasten1y

Indeed, Akash is familiar: https://arxiv.org/abs/2310.20563 :) (I think it was a later paper he co-authored than the one you cite)

[-]Orpheus169mo153

48 entities gave feedback on the Department of Commerce AI reporting requirements.

Public comments offering feedback on BIS's proposed reporting requirements are now up! It received responses from 48 entities including OpenAI, Anthropic, and many AI safety groups.

The reporting requirements are probably one of the most important things happening in US AI policy-- I'd encourage folks here to find time to skim some of the comments.

[-]Orpheus161y133

Recommended reading: A recent piece argues that the US-China crisis hotline doesn't work & generally raises some concerns about US-China crisis communication.

Some quick thoughts:

If the claims in the piece are true, there seem to be some (seemingly tractable) ways of substantially improving US-China crisis communication.
The barriers seem more bureaucratic (understanding how the defense world works and getting specific agencies/people to do specific things) than political (I doubt this is something you need Congress to pass new legislation to improve.)
In general, I feel like "how do we improve our communication infrastructure during AI-related crises" is an important and underexplored area of AI policy. This isn't just true for US-China communication but also for "lab-government communication", "whistleblower-government communication", and "junior AI staffer-senior national security advisor" communication.
- Example: Suppose an eval goes off that suggests that an AI-related emergency might be imminent. How do we make sure this information swiftly gets to relevant people? To what extent do UKAISI and USAISI folks (or lab whistleblowers) have access to senior nationa

... (read more)

[-]Orpheus161mo110

There's a video version of AI2027 that is quite engaging/accessible. Over 1.5M views so far.

Seems great. My main critique is that the "good ending" seems to assume alignment is rather easy to figure out, though admittedly that might be more of a critique of AI2027 itself rather than the way the video portrays it.

1[anonymous]1mo

The comment sections makes me think that half of the people don't understand the ai 2027 scenario or the video glosses over quite a lot of background knowledge, Eg; In replies section of the pinned comment we have people who didn't understand that openbrain was a placeholder. There are a lot of people who were just not the target audience of the video and were vibing off of social reality, but I guess that's normal for — still fringe AI safety movement.

[-]Orpheus1610mo111

Why do people think there's a ~50% chance that Newsom will veto SB1047?

The base rate for vetoes is about 15%. Perhaps the base rate for controversial bills is higher. But it seems like SB1047 hasn't been very controversial among CA politicians.

Is the main idea here that Newsom's incentives are different than those of state politicians because Newsom has national ambitions? So therefore he needs to cater more to the Democratic Party Establishment (which seems to oppose SB1047) or Big Tech? (And then this just balances out against things like "maybe Newsom doesn't want to seem soft on Big Tech, maybe he feels like he has more to lose by deviating from what the legislature wants, the polls support SB1047, and maybe he actually cares about increasing transparency into frontier AI companies?)

Or are there other factors that are especially influential in peoples' models here?

(Tagging @ryan_greenblatt, @Eric Neyman, and @Neel Nanda because you three hold the largest No positions. Feel free to ignore if you don't want to engage.)

[-]Neel Nanda10mo109

My model is basically just "Newsom likely doesn't want to piss off Big Tech or Pelosi, and the incentive to not veto doesn't seem that high, and so seems highly likely to veto, and 50% veto seems super low". My fair is, like, 80% veto I think?

I'm not that compelled by the base rates argument, because I think the level of controversy over the bill is atypically high, so it's quite out of distribution. Eg I think Pelosi denouncing it is very unusual for a state Bill and a pretty big deal

2Orpheus1610mo

Thanks for sharing! Why do you think the CA legislators were more OK pissing off Big Tech & Pelosi? (I mean, I guess Pelosi's statement didn't come until relatively late, but I believe there was still time for people in at least one chamber to change their votes.) To me, the most obvious explanation is probably something like "Newsom cares more about a future in federal government than most CA politicians and therefore relies more heavily on support from Big Tech and approval from national Democratic leaders"– is this what's driving your model?

4Neel Nanda10mo

This is a fair point. I think Newsom is a very visible and prominent target who has more risk here (I imagine people don't pay that much attention to individual California legislators), it's individually his fault if he doesn't veto, and he wants to be President and thus cares much more about national stuff. While the California legislators were probably annoyed at Pelosi butting into state business.

2Eric Neyman10mo

I believe that Pelosi had never once spoken out against a state bill authored by a California Democrat before this.

1Michael Roe10mo

A financial conflict of interest is a wonderous thing...

4Eric Neyman10mo

For what it's worth, I don't have any particular reason to think that that's the reason for her opposition.

2Raemon10mo

Is there some source that particularly indicates this? I get why the 15% base rate might be low, but haven't actually seen evidence apart from this Manifold question that it'd be higher.

1Daniel Samuel10mo

Newsom’s stance on Big Tech is a bit murky. He pushed ideas like the Data Dividend but overall, he seems pretty friendly to the industry. As for Pelosi, she’s still super influential, but she’ll be 88 by the next presidential election. Her long-term influence is definitely something to watch and Newsom probably has a good read on how things will shift.

2Eric Neyman10mo

I think this isn't true. Concretely, I bet that if you looked at the distribution of Democratic No votes among bills that reached Newsom's desk, this one would be among the highest (7 No votes and a bunch of not-voting, which I think is just a polite way to vote No; source). I haven't checked and could be wrong! My take is basically the same as Neel's, though my all-things-considered guess is that he's 60% or so to veto. My position on Manifold is in large part an emotional hedge. (Otherwise I would be placing much smaller bets in the same direction.)

[-]Orpheus161y70

I've started reading the Report on the International Control of Atomic Energy and am finding it very interesting/useful.

I recommend this for AI policy people– especially those interested in international cooperation, US policy, and/or writing for policy audiences.

4Orpheus161y

@Peter Barnett @Rob Bensinger @habryka @Zvi @davekasten @Peter Wildeford you come to mind as people who might be interested. See also Wikipedia Page about the report (but IMO reading sections of the actual report is worth it.)

[-]Orpheus169mo40

Does anyone know why Anthropic doesn't want models with powerful cyber capabilities to be classified as "dual-use foundation models?"

In its BIS comment, Anthropic proposes a new definition of dual-use foundation model that excludes cyberoffensive capabilities. This also comes up in TechNet's response (TechNet is a trade association that Anthropic is a part of).

Does anyone know why Anthropic doesn't want the cyber component of the definition to remain? (I don't think they cover this in the comment).

---

More details– the original criteria for "dual-use f... (read more)

5davekasten9mo

Wild speculation: they also have a sort of we're-watching-but-unsure provision about cyber operations capability in their most recent RSP update. In it, they say in part that "it is also possible that by the time these capabilities are reached, there will be evidence that such a standard is not necessary (for example, because of the potential use of similar capabilities for defensive purposes)." Perhaps they're thinking that automated vulnerability discovery is at least plausibly on-net-defensive-balance-favorable*, and so they aren't sure it should be regulated as closely, even if in still in some informal sense "dual use" ? Again, WILD speculation here. *A claim that is clearly seen as plausible by, e.g., the DARPA AI Grand Challenge effort.

[-]Orpheus161y30

Recommended readings for people interested in evals work?

Someone recently asked: "Suppose someone wants to get into evals work. Is there a good reading list to send to them?" I spent ~5 minutes and put this list together. I'd be interested if people have additional suggestions or recommendations:

I would send them:

Model evaluations for extreme risks
Evaluating frontier models for dangerous capabilities
METR ARA paper
Recent AI Sandbagging paper
Anthropic's challenges in evaluating AI systems
Apollo's starter guide for evals
A paper I'm writing on semi-structured

... (read more)

2Jozdien1y

I'm obviously biased, but I would recommend my post on macrostrategy of evals: The case for more ambitious language model evals.

2Orpheus161y

@Ryan Kidd @Lee Sharkey I suspect you'll have useful recommendations here.

[-]Orpheus161y30

I'm interested in writing out somewhat detailed intelligence explosion scenarios. The goal would be to investigate what kinds of tools the US government would have to detect and intervene in the early stages of an intelligence explosion.

If you know anyone who has thought about these kinds of questions, whether from the AI community or from the US government perspective, please feel free to reach out via LessWrong.

Moderation Log

Curated and popular this week

101Comments