1 min read

2

This is a special post for quick takes by davekasten. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
63 comments, sorted by Click to highlight new comments since:

Epistemic status: not a lawyer, but I've worked with a lot of them.

As I understand it, an NDA isn't enforceable against a subpoena (though the former employer can seek a protective order for the testimony).   Someone should really encourage law enforcement or Congress to subpoena the OpenAI resigners...

A subpoena for what?

Okay, I spent much more time with the Anthropic RSP revisions today.  Overall, I think it has two big thematic shifts for me: 

1.  It's way more "professionally paranoid," but needs even more so on non-cyber risks.  A good start, but needs more on being able to stop human intelligence (i.e., good old fashioned spies)

2.  It really has an aggressively strong vibe of "we are actually using this policy, and We Have Many Line Edits As A Result."  You may not think that RSPs are sufficient -- I'm not sure I do, necessarily -- but I am heartened slightly that they genuinely seem to take the RSP seriously to the point of having mildly-frustrated-about-process-hiccup footnoes about it. (Free advice to Anthropic PR: interview a bunch of staff about this on camera, cut it together, and post it, it will be lovely and humanizing and great recruitment material, I bet). 

I think one thing that is poorly-understood by many folks outside of DC is just how baseline the assumption is that China is a by-default faithless negotiating partner and that by-default China will want to pick a war with America in 2027 or later.  

(I am reporting, not endorsing.  For example, it is deeply unclear to me why we should take another country's statements about the year they're gonna do a war at surface level)

"want to pick a war with America" is really strange wording because China's strategic goals are not "win a war against nuclear-armed America", but things like "be able to control its claims in the South China Sea including invading Taiwan without American interference". Likewise Russia doesn't want to "pick a war with the EU" but rather annex Ukraine; if they were stupid enough to want the former they would have just bombed Paris. I don't know whether national security people relate to the phrasing the same way but they do understand this.

I totally understand your point, agree that many folks would use your phrasing, and nonetheless think there is something uniquely descriptively true about the phrasing I chose and I stand by it.

Has China has made a statment about starting a war in 2027 or later? Who exactly is the belief that "by-default China will want to pick a war with America in 2027 or later" held by and how confident are you that they hold it? 

It is supposedly their goal for when they will have modernized their military.

Thanks for the link! The one mention of starting war was a quote from this 2006 white paper:

"by the middle of the twenty-first century, the strategic goal of building an informatized army and winning informatized wars will be basically achieved"

Is this what you're referring to or did I miss something?

The general belief in Washington is that Xi Jinping has ordered his military to be ready to invade Taiwan by then.  (See, e.g., https://www.reuters.com/world/china/logistics-war-how-washington-is-preparing-chinese-invasion-taiwan-2024-01-31/ )

Sufficient AI superiority will mean overwhelming military superiority. If we remain ahead in AI it won't matter what other countries do. I expect this effect will dominate the strategic landscape by 2027.

Say more ? 

No, the belief is that China isn’t going to start a war before it has a modernized military, and they plan to have a modernized military by 2027. Therefore they won’t start a war before 2027.

China has also been drooling over Taiwan for the past 100 years. Thus, if you don’t think diplomatic or economic ties mean much to them, and they’ll contend with the US’s military might before 2027, and neither party will use nukes in such a conflict, then you expect a war after 2027.

Ah, I misread your comment. Thanks for clarifying!

I don't think they have stated they'll to to war after 2027. 2027 is the year of their "military modernization" target.

It's a small but positive sign that Anthropic sees taking 3 days beyond their RSP's specified timeframe to conduct a process without a formal exception as an issue.  Signals that at least some members of the team there are extremely attuned to normalization of deviance concerns.

At LessOnline, there was a big discussion one night around the picnic tables with @Eliezer_Yudkovsky , @habryka , and some interlocutors from the frontier labs (you'll momentarily see why I'm being vague on the latter names). 

One question was: "does DC actually listen to whistleblowers?" and I contributed that, in fact, DC does indeed have a script for this, and resigning in protest is a key part of it, especially ever since the Nixon years.

Here is a usefully publicly-shareable anecdote on how strongly this norm is embedded in national security decision-making, from the New Yorker article "The U.S. Spies Who Sound the Alarm About Election Interference" by David Kirkpatrick, Oct 21, 2024:
(https://archive.ph/8Nkx5)

The experts’ chair insisted that in this cycle the intelligence agencies had not withheld information “that met all five of the criteria”—and did not risk exposing sources and methods. Nor had the leaders’ group ever overruled a recommendation by the career experts. And if they did? It would be the job of the chair of the experts’ group to stand up or speak out, she told me: “That is why we pick a career civil servant who is retirement-eligible.” In other words, she can resign in protest.

[-]gwern146

Also of relevance is the wave of resignations from the DC newspaper The Washington Post the past few days over Jeff Bezos suddenly exerting control.

Yup.  The fact that the profession that writes the news sees "I should resign in protest" as their own responsibility in this circumstance really reveals something. 

Basic Q: has anyone written much down about what sorts of endgame strategies you'd see just-before-ASI from the perspective of "it's about to go well, and we want to maximize the benefits of it" ?

For example: if we saw OpenPhil suddenly make a massive push to just mitigate mortality at the cost of literally every other development goal they have, I might suspect that they suspect that we're about to all be immortal under ASI, and they're trying to get as many people possible to that future... 

My guess is that we wouldn't actually know with high confidence before (and likely even some time after) things-will-definitely-be-fine.

E.g. 3 months after safe ASI people might still be publishing their alignment takes.  

Oh, to be clear I'm not sure this is at all actually likely, but I was curious if anyone had explored the possibility conditional on it being likely

Endgame strategies from who?

A lot of powerful people would focus on being the ones to control it when it happens, so they'd control the future - and not be subject to some else's control of the future. OpenPhil is about the only org that would think first of the public benefit and not the dangers of other humans controlling it. And not a terribly powerful org, particularly relative to governments.

I was being intentionally broad, here.  I am probably less interested for purposes of this particular post only in the question of "who controls the future" swerves and more about "what else would interested, agentic actors do" questions. 

It is not at all clear to me that OpenPhil is the only org who feels this way -- I can think of several non-EA-ish charities that if they genuinely 100% believed "none of the people you care for will die of the evils you fight if you can just keep them alive for the next 90 days" would plausibly do some interestingly agentic stuff.  

A random observation from a think tank event last night in DC -- the average person in those rooms is convinced there's a problem, but that it's the near-term harms, the AI ethics stuff, etc.  The highest-status and highest-rank people in those rooms seem to be much more concerned about catastrophic harms. 

This is a very weird set of selection effects.  I'm not sure what to make of it, honestly.

Random psychologizing explanation that resonates most with me: Claiming to address big problems requires high-status. A low-rank person is allowed to bring up minor issues, but they are not in a position to bring up big issues that might reflect on the status of many high-status people. 

This is a pretty common phenomenon that I've observed. Many people react with strong social slap-down motions if you (for example) call in question whether the net-effect of a whole social community or economic sector is negative, where the underlying cognitive reality seems similar to "you are not high status enough to bring forward this grievance".

I think this is plausibly describing some folks!  

But I also think there's a separate piece -- I observe, with pretty high odds that it isn't just an act, that at least some people are trying to associate themselves with the near-term harms and AI ethics stuff because they think that is the higher-status stuff, despite direct obvious evidence that the highest-status people in the room disagree.  

There are (at least) two models which could partially explain this:
1) The high-status/high-rank people have that status because they're better at abstract and long-term thinking, and their role is more toward preventing catastrophe rather than nudging toward improvements.  They leave the lesser concerns to the underlings, with the (sometimes correct) belief that it'll come out OK without their involvement.

2) The high-status/high-rank people are rich and powerful enough to be somewahat insulated from most of the prosaic AI risks, while the average member can legitimately be hurt by such things.   So everyone is just focusing on the things most likely to impact themselves.

edit: to clarify, these are two models that do NOT imply the obvious "smarter/more powerful people are correctly worried about the REAL threats, and the average person's concerns are probably unimportant/uninformed".  It's quite possible that this division doesn't tell us much about the relative importance of those different risks.  

Yup1  I think those are potentially very plausible, and similar things were on my short list of possible explanations. I would be very not shocked if those are the true reasons.  I just don't think I have anywhere near enough evidence yet to actually conclude that, so I'm just reporting the random observation for now :)

[-]ZY10

Does "highest status" here mean highest expertise in a domain generally agreed by people in that domain, and/or education level, and/or privileged schools, and/or from more economically powerful countries etc? It is also good to note that sometimes the "status" is dynamic, and may or may not imply anything causal with their decision making or choice on priorities.

One scenario is "higher status" might correlates with better resources to achieve those statuses, and a possibility is as a result they haven't experienced or they are not subject to many near-term harms. In other words, it is not really about the difference between "average" and "high status"'s people's intelligence, but more about what kind of world they are exposed to. 

I do think it is good to hear all different perspectives to stay curious/open-minded. 

edit: I just saw Dragon nicely listed two potential reasons, with scenario 2 mentioning something similar with my comment here. But something slightly specific in my thinking, is that these choices made by "average" and "high status" people may or may not be conscious, but rather from the experience from their lives and the world they are exposed to.

Does "highest status" here mean highest expertise in a domain generally agreed by people in that domain, and/or education level, and/or privileged schools, and/or from more economically powerful countries etc?

I mean, functionally all of those things.  (Well, minus the country dynamic.  Everyone at this event I talked to was US, UK, or Canadian, which is all sorta one team for purposes of status dynamics at that event)

[+][comment deleted]20

I really dislike the term "warning shot," and I'm trying to get it out of my vocabulary.  I understand how it came to be a term people use.  But, if we think it might actually be something that happens, and when it happens, it plausibly and tragically results the deaths of many folks, isn't the right term "mass casualty event" ? 

I think many mass casualty events would be warning shots, but not all warning shots would be mass casualty events. I think an agentic AI system getting most of the way towards escaping containment or a major fraud being perpetrated by an AI system would both be meaningful warning shots, but wouldn't involve mass casualties.

I do agree with what I think you are pointin at, which is that there is something Orwellian about the "warning shot" language. Like, in many of these scenarios we are talking about large negative consequences, and it seems good to have a word that owns that (in-particular in as much as people are thinking about making warning shots more likely before an irrecoverable catastrophe occurs).

I totally think it's true that there are warning shots that would be non-mass-casualty events, to be clear, and I agree that the scenarios you note could maybe be those.

(I was trying to use "plausibly" to gesture at a wide range of scenarios, but I totally agree the comment as written isn't clearly meaning that).

I don't think folks intended anything Orwellian, just sort of something we stumbled into, and heck, if we can both be less Orwellian and be more compelling policy advocates at the same time, why not, I figure. 

[-]robo51

I think a lot of people losing their jobs would probably do the trick, politics-wise.  For most people the crux is "will AIs will be more capable than humans", not "might AIs more capable than humans be dangerous".

You know, you're not the first person to make that argument to me recently.  I admit that I find it more persuasive than I used to.

Put another way: "will AI take all the jobs" is another way of saying* "will I suddenly lose the ability to feed and protect those I love."  It's an apocalypse in microcosm, and it's one that doesn't require a lot of theory to grasp.  

*Yes, yes, you could imagine universal basic income or whatever.  Do you think the average person is Actually Expecting to Get That ? 

Has anyone thought about the things that governments are uniquely good at when it comes to evaluating models? 

Here are at least 3 things I think they have as benefits:
1.  Just an independent 3rd-party perspective generally

2. The ability to draw insights across multiple labs' efforts, and identify patterns that others might not be able to 

3. The ability to draw on classified threat intelligence to inform its research (e.g., Country X is using model Y for bad behavior Z) and to test the model for classified capabilities (bright line example: "can you design an accurate classified nuclear explosive lensing arrangement").

Are there others that come to mind? 

It seems like the current meta is to write a big essay outlining your opinions about AI (see, e.g., Gladstone Report, Situational Awareness, various essays recently by Sam Altman and Dario Amodei, even the A Narrow Path report I co-authored).  

Why do we think this is the case?
I can imagine at least 3 hypotheses:
1.  Just path-dependence; someone did it, it went well, others imitated

2. Essays are High Status Serious Writing, and people want to obtain that trophy for their ideas

3. This is a return to the true original meaning of an essay, under Montaigne, that it's an attempt to write thinking down when it's still inchoate, in an effort to make it more comprehensible not only to others but also to oneself.  And AGI/ASI is deeply uncertain, so the essay format is particularly suited for this.

What do you think?

[-]gwern1910

Well, what's the alternative? If you think there is something weird enough and suboptimal about essay formats that you are reaching for 'random chance' or 'monkey see monkey do' level explanations, that implies you think there is some much superior format they ought to be using instead. But I can't see what. I think it might be helpful to try to make the case for doing these things via some of the alternatives:

  1. a peer-reviewed Nature paper which would be published 2 years from now, maybe, behind a paywall
  2. a published book, published 3 years from starting the first draft now, which some people might get around to reading a year or two after that, and dropping halfway through (assuming you finish and didn't burn out writing it)
  3. a 1 minute Tiktok video by an AI person with non-supermodel looks
  4. a 5-minute heavily-excerpted interview on CNN
  5. a 750-word WSJ or NYT op-ed
  6. a 10-page Arxiv paper in the standard LaTeX template
  7. a Twitter thread of 500 tweets (which can only be read by logged-in users)
  8. a Medium post (which can't be read because it is written in a light gray font illegible to anyone over the age of 20. Also, it's paywalled 90% of the time.)
  9. a 6 hour Lex Fridman podcast interview, about 4 hours in after Lex has finished his obligatory throatclearing questions (like asking you if aliens exist or the universe is made out of love)
  10. interpretive dance in front of the Lincoln Memorial livestreamed on Twitch
  11. ...

(I'd also add in Karnofsky's blog post series.)

I think those are the meta because they have just enough space to not only give opinions but to mention reasons for those opinions and expertise/background to support the many unstated judgment calls.

Note that the essays by Altman and Amodei are popular because their positions are central beyond the others because they have not only demonstrable backgrounds in AI but lots of name recognition (we're mostly assuming Altman has bothered learning a lot about how Transformers work even if we don't like him). And that the Gladstone report got itself commissioned by at least a little piece of the government.

A Narrow Path just demonstrates in the text that you and your co-authors have thought deeply about the topic. Shorter essays leave more guesswork on the authors' expertise and depth of consideration.

I'm pretty sure that I think "infohazard" is a conceptual dead-end concept that embeds some really false understandings of how secrets are used by humans.  It is an orphan of a concept -- it doesn't go anywhere.  Ok, the information's harmful.  You need humans to touch that info anyways to do responsible risk-mitigation.  So now what ? 

[-]ABlue136

That "so now what" doesn't sound like a dead end to me. The question of how to mitigate risk when normal risk-mitigation procedures are themselves risky seems like an important one.

I agree that it's not terribly useful beyond identifying someone's fears.  Using almost any taxonomy to specify what the speaker is actually worried about lets you stop saying "infohazard" and start talking about "bad actor misuse of information" or "naive user tricked by partial (but true) information".  These ARE often useful, even though the aggregate term "infohazard" is limited.

Yeah, that's a useful taxonomy to be reminded of.  I think it's interesting how the "development hazard", item 8, with maybe a smidge of "adversary hazard", is the driver of people's thinking on AI.  I'm pretty unconvinced that good infohazard doctrine, even for AI, can be written based on thinking mainly about that!

I suggest there is a concept distinct enough to warrant the special term, but if it's expansive enough to include secrets, beneficial information that some people prefer others not know, that renders it worthless. 

"Infohazard" ought to be reserved for information that harms the mind that contains it, with spoilers as the most mild examples, SCP-style horrors as the extreme fictional examples.

I think within a bayesian framework where in-general you assume information has positive value, it's useful to have an explicit term when that is not the case. It's a relatively rare occurrence, and as such your usual ways of dealing with information will probably backfire. 

The obvious things to do is to not learn about that information in the first place (i.e. avoid dangerous research), understand and address the causes for why this information is dangerous (because e.g. you can't coordinate on not building dangerous technology), or as a last resort, silo the information and limit the spread of it. 

I do think that it would be useful to have different words that distinguish between "infohazard to the average individual" and "societal infohazard". The first one is really exceedingly rare. The second one is still rare but more common because society has a huge distribution of beliefs and enough crazy people that if information can be used dangerously, there is a non-trivial chance it will. 

I still like the term "recipe for destruction" when limiting it to stuff similar to dangerous technology.

I think a lot of my underlying instinctive opposition to this concept boils down to thinking that we can and do coordinate on this stuff quite a lot.  Arguably, AI is the weird counterexample of a thought that wants to be thunk -- I think modern Western society is very nearly tailor-made to seek a thing that is abstract, maximizing, systematizing of knowledge, and useful, especially if it fills a hole left by the collapse of organized religion.  

I think for most other infohazards, the proper approach requires setting up an (often-government) team that handles them, which requires those employees to expose themselves to the infohazard to manage it.  And, yeah, sometimes they suffer real damage from it.  There's no way to analyze ISIS beheading videos to stop their perpetrators without seeing some beheading videos; I think that's the more-common varietal of infohazard I'm thinking of.

Ok, the information's harmful.  You need humans to touch that info anyways to do responsible risk-mitigation.  So now what ?

I think one of the points is that you should now focus on selective rather than corrective or structural means to figure out who is nonetheless allowed to work on the basis of this information. 

Calling something an infohazard, at least in my thinking, generally implies both that: 

  • any attempts to devise galaxy-brained incentive structures that try to get large groups of people to nonetheless react in socially beneficial ways when they access this information are totally doomed and should be scrapped from the beginning.
  • you absolutely should not give this information to anyone that you have doubts would handle it well; musings along the lines of "but maybe I can teach/convince them later on what the best way to go about this is" are generally wrong and should also be dismissed.

So what do you do if you nonetheless require that at least some people are keeping track of things? Well, as I said above, you use selective methods instead. More precisely, you carefully curate a very short list of human beings that are responsible people and likely also share your meta views on how dangerous truths ought to be handled, and you do your absolute best to make sure the group never expands beyond those you have already vetted as capable of handling the situation properly.

I think at the meta level I very much doubt that I am responsible enough to create and curate a list of human beings for the most dangerous hazards.  For example, I am very confident that I could not 100% successfully detect a foreign government spy inside my friend group, because even the US intelligence community can't do that...  you need other mitigating controls, instead.

Preregistering intent to write "Every Bay Area Walled Compound" (hat tip: Emma Liddell)

Unrelatedly, I am in Berkeley through Wednesday afternoon, let me know if you're around and would like to chat

I suspect this won't get published until November at the earliest, but I am already delightfully pleased with this bit:


Canada geese fly overhead, honking. Your inner northeast Ohioan notices that you are confused; it’s the wrong season for them to migrate this far south, and they’re flying westwards, anyways.

A quick Google discovers that some Canada geese have now established themselves non-migratorily in the Bay Area:

"The Migratory Bird Treaty Act of 1918 banned hunting or the taking of eggs without a permit. These protections, combined with an increase in desirable real estate—parks, golf course and the like—spurred a dramatic turnaround for the species. Canada geese began breeding in the Bay Area—the southern end of their range – in the late 1950s."

You nod, approvingly; this clearly is another part of the East Bay’s well-known, long-term philanthropic commitment to mitigating Acher-Risks.

I feel like one of the trivially most obvious signs that AI safety comms hasn't gone actually mainstream yet is that we don't say, "yeah, superintelligent AI is very risky.  No, I don't mean Terminator.  I'm thinking more Person of Interest, you know, that show with the guy from the Sound of Freedom and the other guy who was on Lost and Evil?"
 

I agree (minor spoilers below). 

In this context, it's actually kind of funny that (at least the latter half of) Person of Interest is explicitly about a misaligned superintelligent AI, which is misaligned because its creator did not take all the necessary safety precautions in building it (as opposed to one of the main characters, who did). Well, technically it's mostly intent-aligned; it's just not value-aligned. But still...  And although it's mostly just misuse risks, there still is a strong component of just how difficult it is to defend the world from such AGI-caused threats.

Root in Season 2 is also kind-of just a more cynical and misandrist version of Larry Page, talking about AIs as the "successor species" to humanity and that us "bad apples" should give way to something more intelligent and pure.

(This is not an endorsement of Jim Caviezel's beliefs, in case anyone somehow missed my point here.)

Why are we so much more worried about LLMs having CBRN risk than super-radicalization risk, precisely ?

(or is this just a expected-harm metric rather than a probability metric ?) 

I am (speaking personally) pleasantly surprised by Anthropic's letter.  https://cdn.sanity.io/files/4zrzovbb/website/6a3b14a98a781a6b69b9a3c5b65da26a44ecddc6.pdf

I'll be at LessOnline this upcoming weekend -- would love to talk to folks about what things they wish someone would write about to explain how DC policy stuff and LessWrong-y themes could be better connected.

Hypothesis, super weakly held and based on anecdote:
One big difference between US national security policy people and AI safety people is that the "grieving the possibility that we might all die" moment happened, on average, more years ago for the national security policy person than the AI safety person. 

This is (even more weakly held) because the national security field has existed for longer, so many participants literally had the "oh, what happens if we get nuked by Russia" moment in their careers in the Literal 1980s...

Ok, so Anthropic's new policy post (explicitly NOT linkposting it properly since I assume @Zac Hatfield-Dodds or @Evan Hubinger or someone else from Anthropic will, and figure the main convo should happen there, and don't want to incentivize fragmenting of conversation) seems to have a very obvious implication.

Unrelated, I just slammed a big AGI-by-2028 order on Manifold Markets.
 

[+][comment deleted]10