Can anyone recommend a ~15 page introduction to AI existential risk that would be appropriate for a general audience with a non-technical background? Ideally, something with a degree of grounding in the current moment (i.e., with something to say about LLMs, chips, China, etc.) rather than a purely abstract take.
Following up to ask:
Thanks for the recommendation btw. I take it from your comment that you're involved in writing this?
I ended up using a version of this from the Center for AI Safety that I edited down for length (basically taking the intro, the front end of each section, and then most of the "Rogue AIs" section).
My context here is assigning this to undergraduates in a survey course where AI is being discussed among other serious future threats. Some things I didn't like about it for that purpose (that may or may not have anything to do with your own target audience):
For my purposes in particular, I'd like to see a bit more on the geopolitics.
Wait but Why has a two part article series on the implications of advanced AIs that, although it's predates interest in LLMs, is really accessible and easy to read. If they're already familiar with the basics of AI, just the second article is probably enough.
Michael Nielsen's How to be a Wise Optimist is maybe a bit longer than you're looking for, but does a good job of framing safety vs capabilities in (imo) an intuitive way.
If AI executives really are as bullish as they say they are on progress, then why are they willing to raise money anywhere in the ballpark of current valuations?
Dario Amodei suggested the other day that AI will take over all or nearly all coding working within months. Given that software is a multi-trillion dollar industry, how can you possibly square that statement with agreeing to raise money at a valuation for Anthropic in the mere tens of billions? And that's setting aside any other value whatsoever for AI.
The whole thing sort of reminds me of the Nigerian prince scam (i.e., the Nigerian prince is coming into an inheritance of tens of millions of dollars but desperately needs a few thousand bucks to claim it, and will cut you in for incredible profit as a result) just scaled up a few orders of magnitude. Anthropic/OpenAI are on the cusp of technologies worth many trillions of dollars, but they're so desperate for a couple billion bucks to get there that they'll sell off big equity stakes at valuations that do not remotely reflect that supposedly certain future value.
If the companies need capital - and I believe that they do - what better option do they have?
I think you’re imagining cash-rich companies choosing to sell portions for dubious reasons, when they could just keep it all for themselves.
But in fact, the companies are burning cash, and to continue operating they need to raise at some valuation, or else not be able to afford the next big training run.
The valuations at which they are raising are, roughly, where supply and demand equilibriate for the amounts of cash that they need in order to continue operating. (Possibly they could raise at higher valuations from taking on less-scrupulous investors, but to date I believe some of the companies have tried to avoid this.)
I don't doubt they need capital. And the Nigerian prince who needs $5,000 to claim the $100 million inheritance does too. It's the fact that he/they can't get capital at something coming anywhere close to the claimed value that's suspicious.
Amodei is forecasting AI that writes 90% of code in three to six months according to this recent comments. Is Anthropic really burning cash so fast that they can't wait a quarter, demonstrate to investors that AI has essentially solved software, and then raise at 10x the valuation?
Is Amodei forecasting that, in 3 to 6 months, AI will produce 90% of the value derived from written code, or just that AI will produce 90% of code, by volume? It would not surprise me if 90% of new "art" (defined as non-photographic, non-graph images) by volume is currently AI-generated, and I would not be surprised to see the same thing happen with code.
And in the same way that "AI produces 90% of art-like images" is not the same thing as "AI has solved art", I expect "AI produces 90% of new lines of code" is not the same thing as "AI has solved software".
Yea, fair enough. His prediction was: "I think we will be there in three to six months, where AI is writing 90% of the code. And then, in 12 months, we may be in a world where AI is writing essentially all of the code"
The second one is more hedged ("may be a world") but "essentially all the code" must translate to a very large fraction of all the value even if that last 1% or whatever is of outsize economic significance.
Amodei is forecasting AI that writes 90% of code in three to six months according to his recent comments.
I vaguely recall hearing something like this, but with crucial qualifiers that disclaim the implied confidence you are gesturing at. I expect I would've noticed more vividly if this statement didn't come with clear qualifiers. Knowing the original statement would resolve this.
The original statement is:
"I think we will be there in three to six months, where AI is writing 90% of the code. And then, in 12 months, we may be in a world where AI is writing essentially all of the code"
So, as I read that he's not hedging on 90% in 3 to 6 months, but he is hedging on "essentially all" (99% or whatever that means) in a year.
Here's the place in the interview where he says this (at 16:16). So there were no crucial qualifiers for the 3-6 months figure, which in hindsight makes sense, since it's near enough to likely refer to his impression of an already existing AI available at Anthropic internally[1]. Maybe also corroborated in his mind with some knowledge about capabilities of a reasoning model based on GPT-4.5, which is almost certainly available internally at OpenAI.
Probably a reasoning model based on a larger pretrained model than Sonnet 3.7. He recently announced in another interview that a model larger than Sonnet 3.7 is due to come out in "relatively small number of time units" (at 12:35). So probably the plan is to release in a few weeks, but something could go wrong and then it'll take longer. Possibly long reasoning won't be there immediately if there isn't enough compute to run it, and the 3-6 months figure refers to when he expects enough inference compute for long reasoning to be released. ↩︎
I appreciate the question you’re asking, to be clear! I’m less familiar with Anthropic’s funding / Dario’s comments, but I don’t think the magnitudes of ask-vs-realizable-value are as far off for OpenAI as your comment suggests?
Eg, If you compare OpenAI’s reported raised at $157B most recently, vs. what its maximum profit-cap likely was in the old (still current afaik) structure.
The comparison gets a little confusing, because it’s been reported that this investment was contingent on for-profit conversion, which does away with the profit cap.
But I definitely don’t think OpenAI’s recent valuation and the prior profit-cap would be magnitudes apart.
(To be clear, I don’t know the specific cap value, but you can estimate it - for instance by analyzing MSFT’s initial funding amount, which is reported to have a 100x capped-profit return, and then adjust for what % of the company you think MSFT got.)
(This also makes sense to me for a company in a very competitive industry, with high regulatory risk, and where companies are reported to still be burning lots and lots of cash.)
One potential angle: automating software won't be worth very much if multiple players can do it and profits are competed to zero. Look at compilers - almost no one is writing assembly or their own compilers, and yet the compiler writers haven't earned billions or trillions of dollars. With many technologies, the vast majority of value is often consumer surplus never captured by producers.
In general I agree with your point. If evidence of transformative AI was close, you'd strategically delay fundraising as late as possible. However, if you have uncertainty about your ability to deliver, investors' ability to recognize transformative potential, or uncertainty about competition, you might hedge and raise sooner than you need. Raising too early never kills a business. But raising too late always does.
This is a key point for a different discussion: job loss and effect on economies. Supposing writing software is almost all automated. Nobody is going to get the trillions currently spent on it. If just two companies, say Anthropic and OpenAI have agents that automate it, they'll compete and drive the prices down to near the compute costs (or collude until others make systems that can compete...)
Now those trillions aren't being spent on writing code. Where do they go? Anticipating how businesses will use their surplus as they pay less wages is probably something someone should be doing. But I don't know of any economists taking seriously claims like AI doing all coding in a few years let alone a few months.
I'm afraid we're going to get blindsided because economists aren't taking the possibility of unprecedented rapid job loss seriously.
So, I certainly wouldn't expect the AI companies to capture all the value; you're right that competition drives the profits down. But, I also don't think it's reasonable to expect profits to get competed down to zero. Innovations in IT are generally pretty easy to replicate, technically speaking, but tech companies operate at remarkably high margins. Even at the moment, your various LLMs are similar but are not exact substitutes for one another, which gives each some market power.
If AI executives really are as bullish as they say they are on progress, then why are they willing to raise money anywhere in the ballpark of current valuations?
The story is that they need the capital to build the models that they think will do that.
Commoditization / no moat? Part of the reason for rapid progress in the field is because there's plenty of fruit left and that fruit is often shared, and also a lot of new models involving more fully exploiting research insights already out there on a smaller scale. If a company was able to try to monopolize it, progress wouldn't be as fast, and if a company can't monopolize it, prices are driven down over time.
AI has probably increased valuations for Big Tech (particularly Nvidia) by at least a few trillion over the past two years. So part of this is that investors think OpenAI/Anthropic will only capture around 10% of total AI profits.
I was very struck by the following claim in Dario Amodei's recent essay:
I don’t think it is too much of a stretch (if we get a “country of geniuses”) to imagine AI companies, semiconductor companies, and perhaps downstream application companies generating ~$3T in revenue per year, being valued at ~$30T, and leading to personal fortunes well into the trillions.
[Following is in a footnote]: The total value of labor across the economy is $60T/year, so $3T/year would correspond to 5% of this. That amount could be earned by a company that supplied labor for 20% of the cost of humans and had 25% market share, even if the demand for labor did not expand (which it almost certainly would due to the lower cost).
This is really not much money. I read this as making a claim about the value of the sector as a whole, though it's possible he's saying that an individual company will achieve that kind of valuation, in which case the sector as a whole might be some multiple higher.
The current market cap of publicly traded tech companies is a bit over $40 trillion on revenue of about $7 trillion. So if the prediction is interpreted as being about the AI sector as a whole, then he's forecasting it will be smaller than the current tech sector. Depending on how expansively one defines the AI sector, we are already definitely within an order of magnitude of $30 trillion in valuation, and perhaps as much as halfway there.
If he's describing projections for a single company, then obviously that's more significant, but it's still pretty low and not something that could really be classed as "transformative" -- this would be a company with a market cap a little under 7x of Nvidia's current value ($4.6 trillion) and revenue about 8x of Alphabet ($385 bn). It's a little sticky to compare this to a GDP number, but it suggests that a "country of geniuses" in a data center would be producing economic value somewhere in between current France and Italy.
Perhaps this is just a throwaway line, or perhaps the Anthropic legal team made him scrub out another prediction that could be interpreted as promising Anthropic investors fantastical returns. Another possibility is that he thinks the returns on AI will overwhelmingly flow outside the sector itself. But, these figures just seem really, really low for a world-transforming technology and that makes me wonder about his beliefs.
It's worth thinking through what today's DeepSeek-induced, trillion dollar-plus drop in AI related stocks means.
There are two basic explanations for DeepSeek's success training models with a lot less compute:
If #1 is true, then I think it implies that we're headed towards a big slowdown in AI progress. The whole economic value proposition for building models just changed. If your frontier model can be imitated at a tiny fraction of the cost after a few months, what good is it? Why would VCs invest money in your training runs?
If #2 is true, then we may be headed towards incredibly rapid AI progress, and the odds of recursively self-improving AI are much higher. If what you really need to build better models is tons and tons of compute, then AI can't speed itself up much. If what you need is just lots of cleverness, then it's much easier to imagine a fast takeoff.
#1 is likely better for alignment in that it will slow things down from the current frenetic pace (the possible downside is that if you can imitate a cutting edge model cheaply and easily then hostile actors may deliberately build misaligned models).
#1 also seems to have big implications for government/legal involvement in AI. If the private sector loses interest in funding models that can be easily imitated, then further progress will tend to rely on either: government investment (as in basic science) or aggressive IP law that allows commercialization of progress by preventing imitators (as we do in drug development). Either of those means a much bigger role for the public sector.
training on O1 outputs
Outputs of o1 don't include reasoning traces, so not particularly useful compared to outputs of chatbot models, and very expensive, so only a modest amount can be collected.
Imitation helps with post-training, but the compute-heavy part is pretraining, and obtaining good quality with little pretraining is a novel feat that isn't known to be explainable by good post-training, or by including a lot of outputs from good models in the pretraining/annealing mix.
Outputs of o1 don't include reasoning traces, so not particularly useful compared to outputs of chatbot models, and very expensive, so only a modest amount can be collected.
It would be more precise to say outputs of o1 aren't supposed to include the reasoning traces. But in addition to the reasoning traces OA voluntarily released, people have been observing what seem to be leaks, and given that the history of LLM robustness to jailbreaks can be summarized as 'nil', it is at least conceivable that someone used a jailbreak+API to exfiltrate a bunch of traces. (Remember that Chinese companies like ByteDance have definitely been willfully abusing the OA API for the purposes of knowledge distillation/cloning and evading bans etc, in addition to a history of extremely cutthroat tactics that FANG would blanch at, so it's a priori entirely plausible that they would do such things.)
I don't believe DeepSeek has done so, but it is technically possible. (Regardless of whether anyone has done so, it is now partially moot given that r1 traces in the DS paper, and based on third party reports thus far, work so well for distillation so everyone can kickstart their own r1-clone with r1 reasoning traces and work from there. There may be more reason to try to exfiltrate o3+ traces, but OA may also decide to not bother, as users are claiming to value and/or enjoy reading the raw traces, and since the secret & capability is out, maybe there's not much point in hiding them any longer.)
I think most of the imitation happens in the pretraining. I don't know about o1, but DeepSeek v3 is at minimum trained on a ton of 4o outputs, although they are slightly cagey about this. Just the first thing I tried, I had ChatGPT write a sock poem:
4o's poem:
Socks, the keepers of toes so warm,
Guardians of feet through life's storm.
In stripes, in polka dots, or plain,
They dance with us through joy and pain.A cozy pair on winter’s chill,
Or mismatched mates—yet charming still.
From drawer to floor, they roam and stray,
But where one goes, the other may.Oh, humble socks, so oft unseen,
Yet vital threads of life's routine.
A quiet comfort, soft and neat,
The unsung heroes of weary feet.
Then I gave v3 just the first two stanzas, and asked it to continue the poem:
v3's continuation:
They hide in shoes, they lounge in beds,
Sometimes they’re lost, or so it’s said.
Yet always there, a faithful friend,
From morning’s start to evening’s end.
They cushion steps, they soften blows,
Through every stride, their comfort shows.
Oh, humble socks, so oft unseen,
The quiet heroes of routine.So here’s to socks, both bold and meek,
For every day, for every week.
A simple joy, a small delight,
They keep us snug, from day to night.
The shot in the dark guess of the "humble socks, so oft unseen... routine" couplet is a fucking soul read. v3 knows 4o, in a way that I kind of hope no model ever knows a person.