My current perspective is that criticism of AGI labs is an under-incentivized public good. I suspect there's a disproportionate amount of value that people could have by evaluating lab plans, publicly criticizing labs when they break commitments or make poor arguments, talking to journalists/policymakers about their concerns, etc.
Some quick thoughts:
Sorry for brevity, I'm busy right now.
Edit:
3. I'm pretty sure OP likes good criticism of the labs; no comment on how OP is perceived. And I think I don't understand your "good judgment" point. Feedback I've gotten on AI Lab Watch from senior AI safety people has been overwhelmingly positive, and of course there's a selection effect in what I hear, but I'm quite sure most of them support such efforts.
4. Conjecture (not exclusively) has done things that frustrated me, including in dimensions like being "'unilateralist,' 'not serious,' and 'untrustworthy.'" I think most criticism of Conjecture-related advocacy is legitimate and not just because people are opposed to criticizing labs.
5. I do agree on "soft power" and some of "jobs." People often don't criticize the labs publicly because they're worried about negative effects on them, their org, or people associated with them.
RE 1& 2:
Agreed— my main point here is that the marketplace of ideas undervalues criticism.
I think one perspective could be “we should all just aim to do objective truth-seeking”, and as stated I agree with it.
The main issue with that frame, imo, is that it’s very easy to forget that the epistemic environment can be tilted in favor of certain perspectives.
EG I think it can be useful for “objective truth-seeking efforts” to be aware of some of the culture/status games that underincentivize criticism of labs & amplify lab-friendly perspectives.
RE 3:
Good to hear that responses have been positive to lab watch. My impression is that this is a mix of: (a) lab watch doesn’t really threaten the interests of labs (especially Anthropic, which is currently winning & currently the favorite lab among senior AIS ppl), (b) the tides have been shifting somewhat and it is genuinely less taboo to criticize labs than a year ago, and (c) EAs respond more positively to criticism that feels more detailed/nuanced (look I have these 10 categories, let’s rate the labs on each dimension) than criticisms that are more about metastrategy (e.g., challenging the entire RSP frame or advocating for pol...
I think now is a good time for people at labs to seriously consider quitting & getting involved in government/policy efforts.
I don't think everyone should leave labs (obviously). But I would probably hit a button that does something like "everyone at a lab governance team and many technical researchers spend at least 2 hours thinking/writing about alternative options they have & very seriously consider leaving."
My impression is that lab governance is much less tractable (lab folks have already thought a lot more about AGI) and less promising (competitive pressures are dominating) than government-focused work.
I think governments still remain unsure about what to do, and there's a lot of potential for folks like Daniel K to have a meaningful role in shaping policy, helping natsec folks understand specific threat models, and raising awareness about the specific kinds of things governments need to do in order to mitigate risks.
There may be specific opportunities at labs that are very high-impact, but I think if someone at a lab is "not really sure if what they're doing is making a big difference", I would probably hit a button that allocates them toward government work or government-focused comms work.
Written on a Slack channel in response to discussions about some folks leaving OpenAI.
I'd be worried about evaporative cooling. It seems that the net result of this would be that labs would be almost completely devoid of people earnest about safety.
I agree with you government pathways to impact are most plausible and until recently undervalued. I also agree with you there are weird competitive pressures at labs.
I largely agree, but think given government hiring timelines, there's no dishonor in staying at a lab doing moderately risk-reducing work until you get a hiring offer with an actual start date. This problem is less bad for the special hiring authorities being used for AI stuff oftentimes, but it's still not ideal.
Suppose the US government pursued a "Manhattan Project for AGI". At its onset, it's primarily fuelled by a desire to beat China to AGI. However, there's some chance that its motivation shifts over time (e.g., if the government ends up thinking that misalignment risks are a big deal, its approach to AGI might change.)
Do you think this would be (a) better than the current situation, (b) worse than the current situation, or (c) it depends on XYZ factors?
My own impression is that this would be an improvement over the status quo. Main reasons:
As you know, I have huge respect for USG natsec folks. But there are (at least!) two flavors of them: 1) the cautious, measure-twice-cut-once sort that have carefully managed deterrencefor decades, and 2) the "fuck you, I'm doing Iran-Contra" folks. Which do you expect will get in control of such a program ? It's not immediately clear to me which ones would.
A mere 5% chance that the plane will crash during your flight is consistent with considering this extremely concerning and doing anything in your power to avoid getting on it. "Alignment is impossible" is not necessary for great concern, isn't implied by great concern.
I don't think this line of argument is a good one. If there's a 5% chance of x-risk and, say, a 50% chance that AGI makes the world just generally be very chaotic and high-stakes over the next few decades, then it seems very plausible that you should mostly be optimizing for making the 50% go well rather than the 5%.
Still consistent with great concern. I'm pointing out that O O's point isn't locally valid, observing concern shouldn't translate into observing belief that alignment is impossible.
Yudkowsky has a pinned tweet that states the problem quite well: it's not so much that alignment is necessarily infinitely difficult, but that it certainly doesn't seem anywhere as easy as advancing capabilities, and that's a problem when what matters is whether the first powerful AI is aligned:
Safely aligning a powerful AI will be said to be 'difficult' if that work takes two years longer or 50% more serial time, whichever is less, compared to the work of building a powerful AI without trying to safely align it.
If the project was fueled by a desire to beat China, the structure of the Manhattan project seems unlikely to resemble the parts of the structure of the Manhattan project that seemed maybe advantageous here, like having a single government-controlled centralized R&D effort.
My guess is if something like this actually happens, it would involve a large number of industry subsidies, and would create strong institutional momentum that even when things got dangerous, to push the state of the art forward, and in as much as there is pushback, continue dangerous development in secret.
In the case of nuclear weapons the U.S. really went very far under the advisement of Edward Teller, so I think the outside view here really doesn't look good:
Here are some AI governance/policy thoughts that I've found myself articulating at least 3 times over the last month or so:
Recent Senate hearing includes testimony from Helen Toner and William Saunders.
Toner is one of the only people criticizing the China arms race claims, like last year: https://www.foreignaffairs.com/china/illusion-chinas-ai-prowess-regulation-helen-toner This also earned her some enmity on social media as a Commie stooge last year.
Generally, it is difficult to understate how completely the PRC is seen as a bad-faith actor in DC these days. Many folks saw them engage in mass economic espionage for a decade while repeatedly promising to stop; those folks are now more senior in their careers than those formative moments. Then COVID happened, and while not everyone believes in the lab leak hypothesis, basically everyone believes that the PRC sure as heck reflexively covered up whether or not they were actually culpable.
(Edit: to be clear, reporting, not endorsing, these claims)
I'm surprised why some people are so interested in the idea of liability for extreme harms. I understand that from a legal/philosophical perspective, there are some nice arguments about how companies should have to internalize the externalities of their actions etc.
But in practice, I'd be fairly surprised if liability approaches were actually able to provide a meaningful incentive shift for frontier AI developers. My impression is that frontier AI developers already have fairly strong incentives to avoid catastrophes (e.g., it would be horrible for Microsoft if its AI model caused $1B in harms, it would be horrible for Meta and the entire OS movement if an OS model was able to cause $1B in damages.)
And my impression is that most forms of liability would not affect this cost-benefit tradeoff by very much. This is especially true if the liability is only implemented post-catastrophe. Extreme forms of liability could require insurance, but this essentially feels like a roundabout and less effective way of implementing some form of licensing (you have to convince us that risks are below an acceptable threshold to proceed.)
I think liability also has the "added" problem of being quite un...
One reason I feel interested in liability is because it opens up a way to do legal investigations. The legal system has a huge number of privileges that you get to use if you have reasonable suspicion someone has committed a crime or is being negligent. I think it's quite likely that if there was no direct liability, that even if Microsoft or OpenAI causes some huge catastrophe, that we would never get a proper postmortem or analysis of the facts, and would never reach high-confidence on the actual root-causes.
So while I agree that OpenAI and Microsoft want to of course already avoid being seen as responsible for a large catastrophe, having legal liability makes it much more likely there will be an actual investigation where e.g. the legal system gets to confiscate servers and messages to analyze what happens, which makes it then more likely that if OpenAI and Microsoft are responsible, they will be found out to be responsible.
New Vox article criticizes Anthropic for trying to weaken SB1047 (as well as for some other things). Some notable sections:
This article makes some fine points but some misleading ones and its thesis is wrong, I think. Bottom line: Anthropic does lots of good things and is doing much better than being maximally selfish/ruthless. (And of course this is possible, contra the article — Anthropic is led by humans who have various beliefs which may entail that they should make tradeoffs in favor of safety. The space of AI companies is clearly not so perfectly competitive that anyone who makes tradeoffs in favor of safety becomes bankrupt and irrelevant.)
It’s pushing back on a landmark California bill to regulate AI.
Yep, Anthropic's policy advocacy seems bad.
It’s taking money from Google and Amazon in a way that’s drawing antitrust scrutiny. And it’s being accused of aggressively scraping data from websites without permission, harming their performance.
My impression is that these are not big issues. I'm open to hearing counterarguments. [Edit: the scraping is likely a substantial issue for many sites; see comment below. (It is not an x-safety issue, of course.)]
...Here’s another tension at the heart of AI development: Companies need to hoover up reams and reams of high-quality text from books and websites in
My impression is that these are not big issues. I'm open to hearing counterarguments.
I think the Anthropic scraper has been causing a non-trivial amount of problems for LW. I am kind of confused because there might be scrapers going around that are falsely under the name "claudebot" but in as much as it is Anthropic, it sure has been annoying (like, killed multiple servers and has caused me like 10+ hours of headaches).
The part of the article I actually found most interesting is this:
In what he called “a cynical procedural move,” Tegmark noted that Anthropic has also introduced amendments to the bill that touch on the remit of every committee in the legislature, thereby giving each committee another opportunity to kill it.
This seems worth looking into and would be pretty bad.
I think there's a decent case that SB 1047 would improve Anthropic's business prospects, so I'm not sure this narrative makes sense. On one hand, SB 1047 might make it less profitable to run an AGI company, which is bad for Anthropic's business plan. But Anthropic is perhaps the best positioned of all AGI companies to comply with the requirements of SB 1047, and might benefit significantly from their competitors being hampered by the law.
The good faith interpretation of Anthropic's argument would be that the new agency created by the bill might be very bad at issuing guidance that actually reduces x-risk, and you might prefer the decision-making of AI labs with a financial incentive to avoid catastrophes without additional pressure to follow the exact recommendations of the new agency.
Why didn't industry succeed in killing SB1047 [so far]?
If someone had told me in 2022 that there would be a bill in CA that the major labs opposed and that the tech industry spent a fair amount of effort lobbying against (to the point of getting Congresspeople and Nancy Pelosi to chime in), I would've been like "that bill seems like it should get killed pretty early on in the process."
Like, if the bill has to go through 5+ committees, I would've predicted that it would die within the first 3 committees.So what's going on? Some plausible explanations:
What do you think are the most noteworthy explanations for why industry has failed to kill SB1047 so far?
My rough ranking of different ways superintelligence could be developed:
My own th...
48 entities gave feedback on the Department of Commerce AI reporting requirements.
Public comments offering feedback on BIS's proposed reporting requirements are now up! It received responses from 48 entities including OpenAI, Anthropic, and many AI safety groups.
The reporting requirements are probably one of the most important things happening in US AI policy-- I'd encourage folks here to find time to skim some of the comments.
Recommended reading: A recent piece argues that the US-China crisis hotline doesn't work & generally raises some concerns about US-China crisis communication.
Some quick thoughts:
Why do people think there's a ~50% chance that Newsom will veto SB1047?
The base rate for vetoes is about 15%. Perhaps the base rate for controversial bills is higher. But it seems like SB1047 hasn't been very controversial among CA politicians.
Is the main idea here that Newsom's incentives are different than those of state politicians because Newsom has national ambitions? So therefore he needs to cater more to the Democratic Party Establishment (which seems to oppose SB1047) or Big Tech? (And then this just balances out against things like "maybe Newsom doesn't want to seem soft on Big Tech, maybe he feels like he has more to lose by deviating from what the legislature wants, the polls support SB1047, and maybe he actually cares about increasing transparency into frontier AI companies?)
Or are there other factors that are especially influential in peoples' models here?
(Tagging @ryan_greenblatt, @Eric Neyman, and @Neel Nanda because you three hold the largest No positions. Feel free to ignore if you don't want to engage.)
My model is basically just "Newsom likely doesn't want to piss off Big Tech or Pelosi, and the incentive to not veto doesn't seem that high, and so seems highly likely to veto, and 50% veto seems super low". My fair is, like, 80% veto I think?
I'm not that compelled by the base rates argument, because I think the level of controversy over the bill is atypically high, so it's quite out of distribution. Eg I think Pelosi denouncing it is very unusual for a state Bill and a pretty big deal
I've started reading the Report on the International Control of Atomic Energy and am finding it very interesting/useful.
I recommend this for AI policy people– especially those interested in international cooperation, US policy, and/or writing for policy audiences.
Does anyone know why Anthropic doesn't want models with powerful cyber capabilities to be classified as "dual-use foundation models?"
In its BIS comment, Anthropic proposes a new definition of dual-use foundation model that excludes cyberoffensive capabilities. This also comes up in TechNet's response (TechNet is a trade association that Anthropic is a part of).
Does anyone know why Anthropic doesn't want the cyber component of the definition to remain? (I don't think they cover this in the comment).
---
More details– the original criteria for "dual-use f...
Recommended readings for people interested in evals work?
Someone recently asked: "Suppose someone wants to get into evals work. Is there a good reading list to send to them?" I spent ~5 minutes and put this list together. I'd be interested if people have additional suggestions or recommendations:
I would send them:
I'm interested in writing out somewhat detailed intelligence explosion scenarios. The goal would be to investigate what kinds of tools the US government would have to detect and intervene in the early stages of an intelligence explosion.
If you know anyone who has thought about these kinds of questions, whether from the AI community or from the US government perspective, please feel free to reach out via LessWrong.
Thanks for sharing! Why do you think the CA legislators were more OK pissing off Big Tech & Pelosi? (I mean, I guess Pelosi's statement didn't come until relatively late, but I believe there was still time for people in at least one chamber to change their votes.)
To me, the most obvious explanation is probably something like "Newsom cares more about a future in federal government than most CA politicians and therefore relies more heavily on support from Big Tech and approval from national Democratic leaders"– is this what's driving your model?