Whew, this heroic length post translates to 2 hours and 23 minutes of top-quality narration :D
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/ai-75-math-is-easier?r=67y1h&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true
I want to share these links in the comments because I hope people who, like me, find audio more accessible, are able to learn of the existence of this podcast feed. I put a fair bit of work into ensuring a high-quality audio feed. I'm not trying to just spam, but if this is outside the norms of lesswrong, please let me know.
Thank you for your service. This is a good thing, and everyone who downvoted you is personally responsible for this site having gotten worse.
Autonomy/boundaries is an important value. Hence even actual help that makes everything substantially better can be unwanted. (In this particular case there is already a similar existing feature, it's more ambiguous.)
everyone who downvoted you is personally responsible for this site having gotten worse
So even in the less convenient possible world where this is clearly true, such unwanting is a part of what matters, its expression lets it be heard. It can sometimes be better for things to be worse. Value is path-dependent, it's not just about the outcome.
It's not my own position in this case, but it's a sensible position even in this case. And taking up the spot of the top comment might be going too far. Minor costs add up but get ignored as a result of sacredness norms, such as linking downvoting to clearly personally making things worse.
"It can sometimes be better for things to be worse."
I will just leave that there with no further comment.
The point is distinction between the value of the greater good and the means of achieving it. Values about spacetime rather than values about some future that forgets the past leading to it. Applicability of this general point to this incident is dubious, but the general point seems valid (and relevant to reasoning behind condemnation of individual instances of downvoting in such cases).
You may wish to know that there is built in audio narration. You can click the little speaker icon underneath the title to pop up the audio player. Also, if you search “Lesswrong Curated & Popular” on a podcast service, you should be able to find a podcast feed of those posts.
Thanks! Yes, I am aware, please, I would encourage you to listen to the difference. I am running this through ElevenLabs, which is currently (IMO) at the forefront of humanlike voices, if produces lifelike tone and cadenced based on cues. I also go through and assign every unique quoted person a unique voice, to ensure clear differentiation when listening, alongside extracting text from images, and providing a description when appropriate.
I really do implore you, please, have a listen to a section and reply back how you find it in comparison.
I do think the eleven labs voice is a bunch better for a lot of the text. My understanding is that stuff like LaTeX is very hard to do with it, and various other things like image captions and other fine-tuning that T3Audio has done make the floor of its quality a bunch worse than the current audio, but I think I agree that for posts like this, it does seem straightforwardly better than the T3Audio narration.
Yeah, just plugging any old unadjusted text straight in to ElevenLabs would get you funky results. I really like Zvi's posts though, I think they are high quality, combining both great original content and a fantastic distillation of quoted views from across the sphere. I do fair amount of work on the text of each post to ensure a good podcast episode gets made out of each one.
The bar for Nature papers is in many ways not so high. Latest says that if you train indiscriminately on recursively generated data, your model will probably exhibit what they call model collapse. They purport to show that the amount of such content on the Web is enough to make this a real worry, rather than something that happens only if you employ some obviously stupid intentional recursive loops.
According to Rylan Schaeffer and coauthors, this doesn't happen if you append the generated data to the rest of your training data and train on this (larger) dataset. That is:
As a simplified model of what will happen in future Web scrapes, the latter seems obviously more appropriate than the former.
I found this pretty convincing.
(See tweet thread and paper.)
This is cool but the name sounds like something a high schooler would put on their resume as their first solo project
Actually, I think that OpenAI's naming strategy is excellent from the marketing perspective. If you don't pay close attention to these things, then GPT and ChatGPT are practically synonyms. Just like for most normies "windows" means "Microsoft Windows", so does "GPT" mean "ChatGPT".
To compare, I'd like to tell people around me about Claude, but I am not even sure how to pronounce it, and they wouldn't be sure how to write it anyway. Marketing is not about trying to impress a few intellectuals with a clever pun, but about making yourself the default option in the minds of the uneducated masses.
It worries me sometimes that when I say "Claude" it sounds basically indistinguishable from "clod" (aka an insulting term for a dumb and/or uneducated person). I like that they gave it a human name, but why that one?
Everyone assumes that it was named after Claude Shannon, but it appears they've never actually confirmed that.
We continuously have people saying ‘AI progress is stalling, it’s all a bubble’ and things like that, and I always find remarkable how little curiosity or patience such people are willing to exhibit. Meanwhile GPT-4o-Mini seems excellent,
This is still consistent with AI stalling. I have been using GPT-4 for ~18 months and it hasn't got better. 4o is clearly worse for complex programming tasks. Open source models catching up is also consistent with it. I have a lot of curiosity and test the latest models when they come out where applicable to my work.
Claude 3 Opus, Llama 3 405B, and Claude 3.5 Sonnet are clearly somewhat better than original GPT 4, with maybe only 10x in FLOPs scaling at most since then. And there is at least 100x more to come within a few years. Planned datacenter buildup is 3 OOMs above largest training runs for currently deployed models, Llama 3 405B was trained using 16K H100s, while Nvidia shipped 3.7M GPUs in 2023. When a major part of progress is waiting for the training clusters to get built, it's too early to call when they haven't been built yet.
What do you mean by clearly somewhat better? I found Claude 3 Opus clearly worse for my coding tasks. GPT 4 went down for a while, and I was forced to swap, and found it really disappointing. Maximum data center size is more like 300K GPUs bc of power, bandwidth constraints etc. These ppl are optimistic, but I don't believe we will meaningfully get above 300k https://www.nextbigfuture.com/2024/07/100-petaflop-ai-chip-and-100-zettaflop-ai-training-data-centers-in-2027.html
XAI, Tesla autopilot are already running equivalent of more than 15K GPU I expect, so 3 OOM more is not happening I expect.
The key point is that a training run is not fundamentally constrained to a single datacenter or a single campus. Doing otherwise is more complicated, likely less efficient, and at the scale of currently deployed models unnecessary. Another popular concern is the data wall. But it seems that there is still enough data even for future training runs that run across multiple datacenters, it just won't allow making inference-efficient overtrained models using all that compute. Both points are based on conservative estimates that don't assume algorithmic breakthroughs. Also, the current models are still trained quickly, while at the boundaries of technical feasiblity it would make more sense to perform very long training runs.
For training across multiple datacenters, one way is continuing to push data parallelism with minibatching. Many instances of a model are separately observing multiple samples, and once they are done the updates are collected from all instances, the optimizer makes the next step, and the updated state is communicated back to all instances, starting the process again. In Llama 3 405B, this seems to take about 6 seconds per minibatch, and there are 1M minibatches overall. Gemini 1.0 report states that
Training Gemini Ultra used a large fleet of TPUv4 accelerators owned by Google across multiple datacenters. ... we combine SuperPods in multiple datacenters using Google’s intra-cluster and inter-cluster network. Google’s network latencies and bandwidths are sufficient to support the commonly used synchronous training paradigm, exploiting model parallelism within superpods and data-parallelism across superpods.
An update would need to communicate the weights, gradients, and some optimizer state, collecting from all clusters involved in training. This could be 3-4 times more than just the weights. But modern fiber optic can carry 30 Tbps per fiber pair (with about 100 links in different wavelength bands within a single fiber, each at about 400 Gbps), and a cable has many fibers. The total capacity of underwater cables one often hears about is on the order of 100 Tbps, but they typically only carry a few fiber pairs, while with 48 fiber pairs we can do 1.3 Pbps. Overland inter-datacenter cables can have even more fibers. So an inter-datacenter network with 100 Tbps dedicated for model training seems feasible to setup with some work. For a 10T parameter model in FP8, communicating 40TB of relevant data would then only take 3 seconds, or 6 for a round trip, comparable to the time for processing a minibatch.
And clusters don't necessarily have to sit doing nothing between minibatches. They could be multiplexing training of more than one model at a time (if all fit in memory at the same time), or using some asynchronous training black magic that makes downtime unnecessary. So it's unclear if there are speed or efficiency losses at all, but even conservatively it seems that training can remain only 2 times slower and more expensive in cost of time.
For the data wall, key points are ability to repeat data and optimal tokens per parameter (Chinchilla scaling laws). Recent measurements for tokens per parameter give about 40 (Llama 3, CARBS), up from 20 for Chinchilla, and for repeated data it goes to 50-60. The extrapolated value of tokens per parameter increases with FLOPs and might go up 50% over 3 OOMs, so I guess it could go as far as 80 tokens per parameters at 15 repetitions of data around 1e28 FLOPs. With a 50T token dataset, that's enough to train with 9T active parameters, or use 4e28 FLOPs in training (Llama 3 405B uses 4e25 FLOPs). With 20% utilization of FP8 on an Nvidia Blackwell, that's 2M GPUs running for 200 days, a $25 billion training run.
Get ChatGPT (ideally Claude of course, but the normies only know ChatGPT) to analyze your text messages, tell you that he’s avoidant and you’re totes mature, or that you’re not crazy, or that he’s just not that into you.
"Mundane utility"?
Faced with that task, the best ChatGPT or Claude is going to do is to sound like the glib, performative, unthinking, off-the-top "advice" people throw around in social media threads when asked similar questions, probably mixed with equally unhelpful corporate-approved boilerplate biased toward the "right" way to interpret things and the "right" way to handle relationships. Maybe worded to sound wise, though.
I would fully expect any currently available LLM to to miss hints, signs that private language is in play, signs of allusions to unmentioned context, indirect implications, any but the most blatant humor, and signs of intentional deception. I'd expect it to come up with random assignments of blame (and possibly to emphasize blame while claiming not to); to see problems where none existed; to throw out nice-sounding truisms about nothing relevant; to give dumb, unhelpful advice about resolving real and imagined problems... and still probably to miss obvious warning signs of really dangerous situations.
I have a "benchmark" that I use on new LLMs from time to time. I paste the lyrics of Kate Bush's "The Kick Inside" into a chat, and ask the LLM to tell me what's going on. I picked the song because it's oblique, but not the least bit unclear to a human paying even a little attention.
The LLMs say all kinds of often correct, and even slightly profoundish-sounding, things about tone and symbolism and structure and whatnot. They say things that might get you a good grade on a student essay. They always, utterly fail to get any of the simple story behind the song, or the speaking character's motivations, or what's obviously going to happen next. If something isn't said straight out, it doesn't exist, and even if it is, its implications get ignored. Leading questions don't help.
I think it's partly that the song's about incest and suicide, and the models have been "safetied" into being blind on those topics... as they've likely also been "safetied" into blindness to all kinds of realities about human relationships, and to any kind of not-totally-direct-or-honest communication. Also, partly I think they Just Don't Get It. It's not like anybody's even particularly tried to make them good at that sort of thing.
That song is no harder to interpret than a bunch of contextless text messages between strangers. In fact it's easier; she's trying hard to pack a lot of message into it, even if she's not saying it outright. When LLMs can get the basic point of poetry written by a teenager (incandescently talented, but still a teenager), maybe their advice about complicated, emotion-carrying conversations will be a good source of mundane utility...
So, as soon as I saw the song name I looked it up, and I had no idea what the heck it was about until I returned and kept reading your comment. I tried getting claude to expand on it. every single time it recognized the incest themes. None of the first messages recognized suicide, but many of the second messages did, when I asked what the character singing this is thinking/intending. But I haven't found leading questions sufficient to get it to bring up suicide first. wait, nope! found a prompt where it's now consistently bringing up suicide. I had to lead it pretty hard, but I think this prompt won't make it bring up suicide for songs that don't imply it... yup, tried it on a bunch of different songs, the interpretations all match mine closely, now including negative parts. Just gotta explain why you wanna know so bad, so I think people having relationship issues won't get totally useless advice from claude. Definitely hard to get the rose colored glasses off, though, yeah.
It's been some time since models have become better than the average human at understanding language.
You should take them exactly the right amount of seriously, as useful ways to discuss questions that are highly imprecise. I mostly only have people in about five buckets, which are something like:
[...]
The Leike Zone (10%-90%), where mostly the wise responses don’t much change.
[...]
In this case, a lot of people have put a lot of thought into the question.
I feel that when people put a lot of thought into the question of P(doom) associated with AI, they should at least provide pairs of conditional estimates, P(Doom|X) and P(Doom|not X).
Especially, when X is a really significant factor which can both increase and decrease chances of various scenarios leading to doom.
A single number can't really be well calibrated, especially when the situation is so complex. When one has a pair of numbers, one can at least try to compare P(Doom|X) and P(Doom|not X), see which one is larger and why, do some extra effort making sure that important parts of the overall picture are not missed.
(Of course, Leike's choice, 10%-90%, which is saying, effectively, "I don't know, but it's significant", is also entirely valid. In this case, the actual answer in terms of a pair of numbers is likely to be, "P(Doom|X) and P(Doom|not X) are both significant, and we don't know which one is higher". To start saying "P(Doom|X) > P(Doom|not X)" or "P(Doom|X) < P(Doom|not X)" would take a lot more effort.)
Google DeepMind got a silver metal at the IMO, only one point short of the gold. That’s really exciting.
We continuously have people saying ‘AI progress is stalling, it’s all a bubble’ and things like that, and I always find remarkable how little curiosity or patience such people are willing to exhibit. Meanwhile GPT-4o-Mini seems excellent, OpenAI is launching proper search integration, by far the best open weights model got released, we got an improved MidJourney 6.1, and that’s all in the last two weeks. Whether or not GPT-5-level models get here in 2024, and whether or not it arrives on a given schedule, make no mistake. It’s happening.
This week also had a lot of discourse and events around SB 1047 that I failed to avoid, resulting in not one but four sections devoted to it.
Dan Hendrycks was baselessly attacked – by billionaires with massive conflicts of interest that they admit are driving their actions – as having a conflict of interest because he had advisor shares in an evals startup rather than having earned the millions he could have easily earned building AI capabilities. so Dan gave up those advisor shares, for no compensation, to remove all doubt. Timothy Lee gave us what is clearly the best skeptical take on SB 1047 so far. And Anthropic sent a ‘support if amended’ letter on the bill, with some curious details. This was all while we are on the cusp of the final opportunity for the bill to be revised – so my guess is I will soon have a post going over whatever the final version turns out to be and presenting closing arguments.
Meanwhile Sam Altman tried to reframe broken promises while writing a jingoistic op-ed in the Washington Post, but says he is going to do some good things too. And much more.
Oh, and also AB 3211 unanimously passed the California assembly, and would effectively among other things ban all existing LLMs. I presume we’re not crazy enough to let it pass, but I made a detailed analysis to help make sure of it.
Table of Contents
Language Models Offer Mundane Utility
Get ChatGPT (ideally Claude of course, but the normies only know ChatGPT) to analyze your text messages, tell you that he’s avoidant and you’re totes mature, or that you’re not crazy, or that he’s just not that into you. But if you do so, beware the guy who uses ChatGPT to figure out how to text you back. Also remember that prompting matters, and if you make it clear you want it to be a sycophant, or you want it to tell you how awful your boyfriend is, then that is often what you will get.
On the differences between Claude Opus 3.0 and Claude Sonnet 3.5, Janus department.
Here’s a benchmark: The STEP 3 examination for medical students. GPT-4o gets 96%, Claude 3.5 gets 90%, both well above passing.
Language Models Don’t Offer Mundane Utility
Here’s a fun and potentially insightful new benchmark: Baba is AI.
When the rules of the game must be manipulated and controlled in order to win, GPT-4o and Gemini 1.5 Pro (and Flash) failed dramatically. Perhaps that is for the best. This seems like a cool place to look for practical benchmarks that can serve as warnings.
Figuring out how this happened is left as an exercise for the reader.
A similar phenomenon that has existed for a long time: Pandora stations, in my experience, reliably collapse in usefulness if you rate too many songs. You want to offer a little guidance, and then stop.
I get exactly how all this is happening, you probably do too. Yet they keep doing it.
Math is Easier
Two hours after my last post that included mention about how IMO problems were hard to solve, Google DeepMind announced it had gotten a silver metal at the International Math Olympiad (IMO), one point (out of, of course, 42) short of gold.
They are solving IMO problems one problem type at a time. AlphaGeometry figured out how to do geometry problems. Now we have AlphaProof to work alongside it. The missing ingredient is now combinatorics, which were the two problems this year that couldn’t be solved. In most years they’d have likely gotten a different mix and hit gold.
This means Google DeepMind is plausibly close to not only gold metal performance, but essentially saturating the IMO benchmark, once it gets its AlphaCombo branch running.
The obvious response is ‘well, sure, the IMO is getting solved, but actually IMO problems are drawn from a remarkably fixed distribution and follow many principles. This doesn’t mean you can do real math.’
Yes and no. IMO problems are simultaneously:
So yes, you can now write off whatever they AI now can do and say it won’t get to the next level, if you want to do that, or you can make a better prediction that it is damn likely to reach the next level, then the one after that.
Timothy Gowers notes some caveats. Humans had to translate the problems into symbolic form, although the AI did the ‘real work.’ The AI spent more time than humans were given, although that will doubtless rapidly improve. He notes that a key question will be how this scales to more difficult problems, and whether the compute costs go exponentially higher.
Llama Llama Any Good
Arena results are in for Llama-3.1-405k, about where I expected. Not bad at all.
All the Elo rankings are increasingly bunching up. Llama 405B is about halfway from Llama 70B to GPT-4o, and everyone including Sonnet is behind GPT-4o-mini, but all of it is close, any model here will often ‘beat’ any other model here on any given question head-to-head.
Unfortunately, saturation of benchmarks and Goodhart’s Law come for all good evaluations and rankings. It is clear Arena, while still useful, is declining in usefulness. I would no longer want to use its rankings for a prediction market a year from now, if I wanted to judge whose model is best. No one seriously thinks Sonnet is only 5 Elo points better than Gemini Advanced, whatever that measure is telling us is increasingly distinct from what I most care about.
Another benchmark.
Remarkable how bad Gemini does here, and that Gemini 1.5 Flash is ahead of Gemini 1.5 Pro.
Note the big gap between tier 1, from Sonnet to Opus, and then tier 2. Arguably Claude 3.5 Sonnet and Llama 3.1 are now alone in tier 1, then GPT-4, GPT-4o and Claude Opus are tier 2, and the rest are tier 3.
This does seem to be measuring something real and important. I certainly wouldn’t use Gemini for anything requiring high quality logic. It has other ways in which it is competitive, but it’s never sufficiently better to justify thinking about whether to context switch over, so I only use Claude Sonnet 3.5, and occasionally GPT-4o as a backup for second opinions.
Shubham Saboo suggests three ways to run Llama 3.1 locally: Ollama + Open WebUI, LM Studio or GPT4All. On your local machine, you are likely limited to 8B.
Different models for different purposes, even within the same weight class?
With big models you can use mixture of experts strategies at low marginal cost. If you’re already trying to use 8B models, then each additional query is relatively expensive. You’ll need to already know your context.
Search for the GPT
OpenAI is rolling ‘advanced Voice Mode’ out to a small Alpha group of users. No video yet. Only the four fixed voices, they say this part is for safety reasons, and there are additional guardrails to block violent or copyrighted material. Not sure why voice cares more about those issues than text.
Altman here says it is cool when it counts to 10 then to 50, perhaps because it ‘pauses to catch its breath.’ Okey dokey.
GPT for me and also for thee, everything to be named GPT.
I think AI has already replaced many Google searches. I think that some version of AI search will indeed replace many more, but not (any time soon) all, Google searches.
I also think that it is to their great credit that they did not cherry pick their example.
I presume it usually does better than that, and I thank them for their openness.
Well, we do mind the fake stuff. We don’t mind at the level Google expected us to mind. If the thing is useful despite the fake stuff, we will find a way. One can look and verify if the answers are real. In most cases, a substantial false positive rate is not a big deal in search, if the false positives are easy for humans to identify.
Let’s say that #5 above was actually in August and was the festival I was looking for. Now I have to check five things. Not ideal, but entirely workable.
The Obvious Nonsense? That’s mostly harmless. The scary scenario is it gives you false positives that you can’t identify.
Tech Company Will Use Your Data to Train Its AIs
Remember when Meta decided that all public posts were by default fair game?
Twitter is now pulling the same trick for Grok and xAI.
You can turn off the setting here, on desktop only.
My view is:
Fun with Image Generation
MidJourney 6.1 is live. More personalization, more coherent images, better image quality, new upscalers, default 25% faster, more accurate text and all that. Image model improvements are incremental and getting harder to notice, but they’re still there.
Deepfaketown and Botpocalypse Soon
(Editorial policy note: We are not covering the election otherwise, but this one is AI.)
We have our first actual political deepfake with distribution at scale. We have had AI-generated political ads that got a bunch of play before, most notably Trump working DeSantis into an episode of The Office as Michael Scott, but that had a sense of humor and was very clearly what it was. We’ve had clips of fake speeches a few times, but mostly those got ignored.
This time, Elon Musk shared the deepfake of Kamala Harris, with the statement ‘This is amazing,” as opposed to the original post which was clearly marked as a partity. By the time I woke up the Musk version already been viewed 110 million times from that post alone.
In terms of actually fooling anyone I would hope this is not a big deal. Even if you don’t know that AI can fake people’s voices, you really really should know this is fake with 99%+ probability within six seconds when she supposedly talks about Biden being exposed as senile. (I was almost positive within two seconds when the voice says ‘democrat’ rather than ‘democratic’ but it’s not fair to expect people to pick that up).
Mostly my read is that this is pretty tone deaf and mean. ‘Bad use of AI.’ There are some good bits in the middle that are actually funny and might be effective, exactly because those bits hit on real patterns and involve (what I think are) real clips.
Balaji calls this ‘the first good AI political parody.’ I believe that this very much did not clear that bar, and anyone saying otherwise is teaching us about themselves.
The Harris campaign criticized Musk for it. Normally I would think it unwise to respond due to the Streisand Effect but here I don’t think that is a worry. I saw calls to ‘sue for libel’ or whatever, but until we pass a particular law about disclosure of AI in politics I think this is pretty clearly protected speech even without a warning. It did rather clearly violate Twitter’s policy on such matters as I understand it, but it’s Musk.
Greatly accelerating or refining existing things can change them in kind. We do not quote yet have AIs that can do ‘ideological innovation’ and come up with genuinely new and improved (in effectiveness) rhetoric and ideological arguments and attacks, but this is clearly under ‘things the AI will definitely be able to do reasonably soon.’
Our defenses against dangerous and harmful ideologies have historically been of the form ‘they cause a big enough disaster to cause people to fight back’ often involving a local (or regional or national) takeover. That is not a great solution historically, with some pretty big narrow escapes and a world still greatly harmed by many surviving destructive ideologies. It’s going to be a problem.
And of course, one or more of these newly powerful ideologies is going to be some form of ‘let the AIs run things and make decisions, they are smarter and objective and fair.’ Remember when Alex Tabarrok said ‘Claude for President’?
AI boyfriend market claimed to be booming, but no hard data is provided.
AI girlfriend market is of course mostly scams, or at least super low quality services that flatter the user and then rapidly badger you for money. That is what you would expect, this is an obviously whale-dominated economic system where the few suckers you can money pump are most of the value. This cycle feeds back upon itself, and those who would pay a reasonable amount for an aligned version quickly realize that product is unavailable. And those low-quality hostile services are of course all over every social network and messaging service.
Meanwhile those who could help provide high quality options, like OpenAI, Anthropic and Google, try to stop anyone from offering such services, partly because they don’t know how to ensure the end product is indeed wholesome and not hostile.
Thus the ‘if we don’t find a way to provide this they’ll get it on the street’ issue…
This is super doable, if you can make the business model work. It would help if the responsible AI companies would play ball rather than shutting such things out.
The ‘good’ news is that even if the good actors won’t play ball, we can at least use Llama-3.1-405B and Llama-3.1-70B, which definitely will play ball and offer us the base model. Someone would have to found the ‘wholesome’ AI companion company, knowing the obvious pressures to change the business model, and build up a trustworthy reputation over time. Ideally you’d pay a fixed subscription, it would then never do anything predatory, and you’d get settings to control other aspects.
Do continue to watch out for deepfake scams on an individual level, here’s a Ferrari executive noticing one, and Daniel Eth’s mom playing it safe. It seems fine to improvise the ‘security questions’ as needed in most spots.
Also, the thing where phones increasingly try to automatically ‘fix’ photos is pretty bad. There’s no ill intent but all such modifications should require a human explicitly asking for them. Else you get this:
Indeed I am happy that the faces are messed up. The version even a year from now might be essentially impossible for humans to notice.
If you want to enhance the image of the moon, sure, go nuts. But there needs to be a human who makes a conscious choice to do that, or at least opt into the feature.
The Art of the Jailbreak
In case anyone was wondering, yes, Pliny broke GPT-4o voice mode, and in this case you can play the video for GPT-4o to do it yourself if you’d like (until OpenAI moves to block that particular tactic).
This seems totally awesome:
If I was looking over charity applications for funding, I would totally fund this (barring seeing tons of even better things). This is the True Red Teaming and a key part of your safety evaluation department.
Also, honestly, kind of embarrassing for every model this trick works on.
Janus on the 405
Things continue to get weirder. I’ll provide a sampling, for full rabbit hole exploration you can follow or look at the full Twitter account.
Also this:
This last one is not primarily Janus, and is framed by someone trying to raise alarm rather than thinking all of this is super cool, but is still the same department.
They Took Our Jobs
Ethan Mollick with a variation of the Samo Burja theory of AI and employment. Samo’s thesis is that you cannot automate away that which is already bullshit.
Get Involved
Akrose has listings for jobs, funding and compute opportunities, and for AI safety programs, fellowships and residencies, with their job board a filter from the 80k hours job board (which badly needs a filter, given it will still list jobs at OpenAI).
In particular, US AI Safety Institute is hiring.
OpenPhil request for proposal on AI governance.
2025 Horizon Fellowship applications are open, for people looking to go full time in Washington. Deadline is August 30.
Introducing
Moonglow.ai, which claims they allow you to seamlessly move compute usage from your computer to a cloud provider when you need that.
Friend.com, oh no (or perhaps oh yeah?). You carry it around, talk to it, read its outputs on your phone. It is ‘always listening’ and has ‘free will.’ Why? Dunno.
That is always the default result of new consumer hardware: Nothing.
And if you say there might be a bit of a bubble, well, maybe. Hard to say.
Odds say they’ll probably (68%) sell 10k units, and probably (14%) won’t sell 100k.
My presumption is the product is terrible, and we will never hear from them again.
In Other AI News
GPT-5 in 2024 at 60% on Polymarket. Must be called GPT-5 to count.
Over 40 tech organizations, including IBM, Amazon, Microsoft and OpenAI, call for the authorization of NIST’s AI Safety Institute (AISI). Anthropic did not sign. Jack Clark says this was an issue of prioritization and they came very close to signing.
Good to hear. I don’t know why they don’t just… sign it now, then? Seems like a good letter. Note the ‘philosophically supportive’ – this seems like part of a pattern where Anthropic might be supportive of various things philosophically or in theory, but it seems to often not translate into practice in any way visible to the public.
Microsoft stock briefly down 7%, then recovers to down 3% during quarterly call, after warning AI investments would take longer to payoff than first thought, then said Azure growth would accelerate later this year. Investors have no patience, and the usual AI skeptics declared victory on very little. The next day it was ~1% down, but Nasdaq was up 2.5% and Nvidia up 12%. Shrug.
Gemini got an update to 1.5 Flash.
xAI and OpenAI on track to have training runs of ~3×10^27 flops by end of 2025, two orders of magnitude bigger than GPT-4 (or Llama-3.1-405B). As noted here, GPT-4 was ~100x of GPT-3, which was ~100x of GPT-2. Doubtless others will follow.
The bar for Nature papers is in many ways not so high. Latest says that if you train indiscriminately on recursively generated data, your model will probably exhibit what they call model collapse. They purport to show that the amount of such content on the Web is enough to make this a real worry, rather than something that happens only if you employ some obviously stupid intentional recursive loops.
File this under ‘you should know this already,’ yes future models that use post-2023 data are going to have to filter their data more carefully to get good results.
Quiet Speculations
Yeah, uh huh:
Even if OpenAI does need to raise ‘more money than has ever been raised in the Valley,’ my bold prediction is they would then… do that. There are only two reasons OpenAI is not a screaming buy at $80 billion:
I mean, they do say ‘consider your investment in the spirit of a donation.’ If you invest in Sam Altman with that disclaimer at the top, how surprised would you be if the company did great and you never saw a penny? Or to learn that you later decided you’d done a rather ethically bad thing?
Yeah, me neither. But I expect plenty of people who are willing to take those risks.
The rest of the objections here seem sillier.
The funniest part is when he says ‘I hope I’m wrong.’
I really, really doubt he’s hoping that.
Burning through this much cash isn’t even obviously bearish.
And yet people really, really want generative AI to all be a bust somehow.
I don’t use LLMs hundreds of times a day, but I use them most days, and I will keep being baffled that people think it’s a ‘grift.’
Similarly, here’s Zapier co-founder Mike Knoop saying AI progress towards AGI has ‘stalled’ because 2024 in particular hasn’t had enough innovation in model capabilities, all it did so far was give us substantially better models that run faster and cheaper. I knew already people could not understand an exponential. Now it turns out they can’t understand a step function, either.
Think about what it means that a year of only speed boosts and price drops alongside substantial capability and modality improvements and several competitors passing previous state of the art, when previous generational leaps took several years each, makes people think ‘oh there was so little progress.’
The Quest for Sane Regulations
Gated op-ed in The Information favors proposed California AI regulation, says it would actively help AI.
Nick Whitaker offers A Playbook for AI Policy at the Manhattan Institute, which was written in consultation with Leopold Aschenbrenner.
Its core principles, consistent with Leopold’s perspective, emphasize things differently than I would have, and present them differently, but are remarkably good:
It is remarkable how much framing and justifications change perception, even when the underlying proposals are similar.
Tyler Cowen linked to this report, despite it calling for government oversight of the training of top frontier models, and other policies he otherwise strongly opposes.
Whitaker calls for a variety of actions to invest in America’s success, and to guard that success against expropriation by our enemies. I mostly agree.
There are common sense suggestions throughout, like requiring DNA synthesis companies to do KYC. I agree, although I would also suggest other protocols there.
Whitaker calls for narrow AI systems to remain largely unregulated. I agree.
Whitaker calls for retaining the 10^26 FLOPS threshold in the executive order (and in the proposed SB 1047 I would add) for which models should be evaluated by the US AISI. If the tests find sufficiently dangerous capabilities, export (and by implication the release of the weights, see below) should be restricted, the same as similar other military technologies. Sounds reasonable to me.
Note that this proposal implies some amount of prior restraint, before making a deployment that could not be undone. Contrast SB 1047, a remarkably unrestrictive proposal requiring only internal testing and with no prior restraint.
He even says this, about open weights and compute in the context of export controls.
I strongly agree.
If we allow countries with export controls to rent our chips, that is effectively evading the export restrictions.
If a model is released with open weights, you are effectively exporting and giving away the model, for free, to foreign corporations governments. What rules you claim to be imposing to prevent this do not matter, any more than your safety protocols will survive a bit of fine tuning. China’s government and corporations will doubtless ignore any terms of service you claim to be imposing.
Thus, if and when the time comes that we need to restrict exports of sufficiently advanced models, if you can’t fully export them then you also can’t open their weights.
We need to be talking price. When would such restrictions need to happen, under what circumstances? Zuckerberg’s answer was very clear, it is the same as Andreessen’s, and it is never, come and take it, uber alles, somebody stop me.
My concern is that this report, although not to the extreme extent of Sam Altman’s editorial that I discuss later, frames the issue of AI policy entirely in nationalistic terms. America must ‘maintain its lead’ in AI and protect against its human adversaries. That is the key thing.
The report calls for scrutiny instead of broadly-capable AIs, especially those with military and military-adjacent applications. The emphasis on potential military applications reveals the threat model, which is entirely other humans, the bad guy with the wrong AI, using it conventionally to try and defeat the good guy with the AI, so the good AI needs to be better sooner. The report extends this to humans seeking to get their hands on CBRN threats or to do cybercrime.
Which is all certainly an important potential threat vector. But I do not think they are ultimately the most important ones, except insofar as such fears drive capabilities and thus the other threat vectors forward, including via jingoistic reactions.
Worrying about weapons capabilities, rather than (among other things) about the ability to accelerate further AI research and scientific progress that leads into potential forms of recursive self-improvement, or competitive pressures to hand over effective control, is failing to ask the most important questions.
Part 1 discusses the possibility of ‘high level machine intelligence’ (HLMI) or AGI arriving soon. And Leopold of course predicts its arrival quite soon. Yet this policy framework is framed and detailed for a non-AGI, non-HLMI world, where AI is strategically vital but remains a ‘mere tool’ typical technology, and existential threats or loss of control are not concerns.
I appreciated the careful presentation of the AI landscape.
For example, he notes that RLHF is expected to fail as capabilities improve, and presents ‘scalable oversight’ and constitutional AI as ‘potential solutions’ but is clear that we do not have the answers. His statements about interpretability are similarly cautious and precise. His statements on potential future AI agents are strong as well.
What is missing is a clear statement of what could go wrong, if things did go wrong. In the section ‘Beyond Human Intelligence’ he says superhuman AIs would pose ‘qualitatively new national security risks.’ And that there are ‘novel challenges for controlling superhuman AI systems.’ True enough.
But reading this, would someone who was not doing their own thinking about the implications understand that the permanent disempowerment of humanity, or outright existential or extinction risks from AI, were on the table here? Would they understand the stakes, or that the threat might not come from malicious use? That this might be about something bigger than simply ‘national security’ that must also be considered?
Would they form a model of AI that would then make future decisions that took those considerations into account the way they need to be taken into account, even if they are far more tractable issues than I expect?
No. The implication is there for those with eyes to see it. But the report dare not speak its name.
The ‘good news’ is that the proposed interventions here, versus the interventions I would suggest, are for now highly convergent.
For a central example: Does it matter if you restrict chip and data and model exports in the name of ‘national security’ instead of existential risk? Is it not the same policy?
If we invest in ‘neglected research areas’ and that means the AI safety research, and the same amount gets invested, is the work not the same? Do we need to name the control or alignment problem in order get it solved?
In these examples, these could well be effectively the same policies. At least for now. But if we are going to get through this, we must also navigate other situations, where differences will be crucial.
The biggest danger is that if you sell National Security types on a framework like this, or follow rhetoric like that now used by Sam Altman, then it is very easy for them to collapse into their default mode of jingoism, and to treat safety and power of AI the way they treated the safety and power of nuclear weapons – see The Doomsday Machine.
It also seems very easy for such a proposal to get adopted without the National Security types who implement it understanding why the precautions are there. And then a plausible thing that happens is that they strip away or cripple (or simply execute poorly) the parts that are necessary to keep us safe from any threat other than a rival having the strong AI first, while throwing the accelerationist parts into overdrive.
These problems are devilishly hard and complicated. If you don’t have good epistemics and work to understand the whole picture, you’ll get it wrong.
For the moment, it is clear that in Washington there has been a successful campaign by certain people to create in many places allergic reactions to anyone even mentioning the actual most important problems we face. For now, it turns out the right moves are sufficiently overdetermined that you can make an overwhelming case for the right moves anyway.
But that is not a long term solution. And I worry that abiding by such restrictions is playing into the hands of those who are working hard to reliably get us all killed.
Death and or Taxes
In addition to issues like an industry-and-also-entire-economy destroying 25% unrealized capital gains tax, there is also another big tax issue for software companies.
A key difference is that this other problem is already on the books, and is already wrecking its havoc in various ways, although on a vastly smaller scale than the capital gains tax would have.
Why aren’t more people being louder about this?
Partly because there is no clear partisan angle here. Both parties agree that this needs to be fixed, and both are unwilling to make a deal acceptable to the other in terms of what other things to do while also fixing this. I’m not going to get into here who is playing fair in those negotiations and who isn’t.
SB 1047 (1)
A public service announcement, and quite a large sacrifice if you include xAI:
Once again, if he wanted to work on AI capabilities, Dan Hendrycks could be quite wealthy and have much less trouble.
Even simply taking advisor shares in xAI would have been quite the bounty, but he refused to avoid bad incentives.
He has instead chosen to try to increase the chances that we do not all die. And he shows that once again, in a situation where his opponents like Marc Andreessen sometimes say openly that they only care about what is good for their bottom lines, and spend large sums on lobbying accordingly. One could argue these VCs (to be clear: #NotAllVCs!) do not have a conflict of interest. But the argument would be that they have no interest other than their own profits, so there is no conflict.
SB 1047 (2)
I do not agree, but much better than most criticism I have seen: Timothy Lee portrays SB 1047 as likely to discourage (covered) open weight models in its current form.
Note that exactly because this is a good piece that takes the bill’s details seriously, a lot of it is likely going to be obsolete a week from now – the details being analyzed will have changed. For now, I’m going to respond on the merits, based solely on the current version of the bill.
I was interviewed for this, and he was clearly trying to actually understand what the bill does during our talk, which was highly refreshing, and he quoted me fairly. The article reflects this as well, including noting that many criticisms of the bill do not reflect the bill’s contents.
When discussing what decision Meta might make with a future model, Timothy correctly states what the bill requires.
On these precautions Meta would be required to take during training, I think that’s actively great. If you disagree please speak directly into this microphone. If Meta chooses not to train a big model because they didn’t want to provide proper cybersecurity or be able to shut down their copies, then I am very happy Meta did not train that model, whether or not it was going to be open. And if they decide to comply and do counterfactual cybersecurity, then the bill is working.
As noted below, ‘violate this policy’ does not mean ‘there is such an event.’ If you correctly provided reasonable assurance – a standard under which something will still happen sometimes – and the event still happens, you’re not liable.
On the flip side, if you do enable harm, you can violate the policy without an actual critical harm happening.
‘Provide reasonable assurance’ is a somewhat stricter standard than the default common law principle of ‘take reasonable care’ that would apply even without SB 1047, but it is not foundationally so different. I would prefer to keep ‘provide reasonable assurance’ but I understand that the difference is (and especially, can and often is made to sound) far scarier than it actually is.
Timothy also correctly notes that the bill was substantially narrowed by including the $100 million threshold, and that this could easily render the bill mostly toothless. That it will only apply to the biggest companies – it seems quite likely that the number of companies seriously contemplating a $100 million training run for an open weight model under any circumstances is going to be either zero or exactly one: Meta.
There is an asterisk on ‘any derivative models,’ since there is a compute threshold where it would no longer be Meta’s problem, but this is essentially correct.
Timothy understands that yes, the safety guardrails can be easily removed, and Meta could not prevent this. I think he gets, here, that there is little practical difference, in terms of these risks, between Meta releasing an open weights model whose safeguards can be easily removed, or Meta releasing the version where the safeguards were never there in the first place, or OpenAI releasing a model with no safeguards and allowing unlimited use and fine tuning.
The question is price, and whether the wording here covers cases it shouldn’t.
Well, maybe, but not so fast. There are some important qualifiers to that. Using the model to carry out cyberattacks is insufficient to qualify. See 22602(g), both (1) and (2).
If it was indeed a relevant critical harm actually happening does not automatically mean Meta is liable. The Attorney General would have to choose to bring an action, and a court would have to find Meta did something unlawful under 22606(a).
Which here would mean a violation under 22603, presumably 22603(c), meaning that Meta made the model available despite an ‘unreasonable risk’ of causing or enabling a critical harm by doing so.
That critical harm cannot be one enabled by knowledge that was publicly available without a covered model (note that it is likely no currently available model is covered). So in Timothy’s fertilizer truck bomb example, that would be holding the truck manufacturer responsible only if the bomb would not have worked using a different truck. Quite a different standard.
And the common law has provisions that would automatically attach in a court case, if (without loss of generality) Meta did not indeed create genuinely new risk, given existenting alternatives. This is a very common legal situation, and courts are well equipped to handle it.
That still does not mean Meta is required to ensure such a critical harm never happens. That is not what reasonable assurance (or reasonable care) means. Contrast this for example with the proposed AB 3211, which requires ‘a watermark that is designed to be as difficult to remove as possible using state of the art techniques,’ a much higher standard (of ‘as difficult as possible’) that would clearly be unreasonable here and probably is there as well (but I haven’t done the research to be sure).
Nor do I think, if one sincerely could not give reasonable assurance that your product would counterfactually enable a cyberattack, that your lawyers would want you releasing that product under current common law?
As I understand the common law, the default here is that everyone is required to take ‘reasonable care.’ If you were found to have taken unreasonable care, then you would be liable. And again, my understanding is that there is some daylight between reasonable care and reasonable assurance, but not all that much. In most cases that Meta was unable to ‘in good faith provide reasonable assurance’ it would be found, I predict, to also not have taken ‘reasonable care.’
And indeed, having ‘good faith’ makes it not clear that reasonable assurance is even a higher standard here. So perhaps it would be better for everyone to switch purely to the existing ‘reasonable care.’
(This provision used to offer yet more protection – it used to be that you were only responsible if the model did something that could not be done without a covered model that was ineligible for a limited duty exception. That meant that unless you were right at the frontier, you would be fine. Alas, thanks to aggressive lobbying by people who did not understand what the limited duty exception was (or who were acting against their own interests for other unclear reasons), the limited duty exception was removed, altering this provision as well. Was very much the tug of war meme but what’s done is done and it’s too late to go back now.)
So this is indeed the situation that might happen in the future, whether or not SB 1047 passes. Meta (or another company, realistically it’s probably Meta) may have a choice to make. Do they want to release the weights of their new Llama-4-1T model, while knowing this is dangerous, and that this prevents them from being able to offer reasonable assurance that it will not cause a critical harm, or might be found not to have taken reasonable care – whether or not there is an SB 1047 in effect?
Or do we think that this would be a deeply irresponsible thing to do, on many levels?
(And as Timothy understands, yes, the fine-tune is in every sense, including ethical and causal and logical, Meta’s responsibility here, whoever else is also responsible.)
I would hope that the answers in both legal scenarios are the same.
I would even hope you would not need legal incentive to figure this one out?
This does not seem like a wise place to not take reasonable care.
In a saner world, we would have more criticisms and discussions like this. We would explore the law and what things mean, talk price, and negotiate on what standard for harm and assurance or care are appropriate, and what the damages threshold should be, and what counterfactual should be used. This is exactly the place we should be, in various ways, talking price.
But fundamentally, what is going on with most objections of this type is:
Or:
In particular, the differentially unsafe products would allow users to disable their safety features, enable new unintended and unanticipated or unimagined uses, some of which would be unsafe to third party non-users, at scale, and once shipped the product would be freely and permanently available to everyone, with no ability to recall it, fix the issue or shut it down.
Timothy is making the case that the bar for safety is set too high in some ways (or the threshold of harm or risk of harm too low). One can reasonably think this, and that SB 1047 should move to instead require reasonable care or that $500 million is the wrong threshold, or its opposite – that the bar is set too low, that we already are making too many exceptions, or that this clarification of liability for adverse events shouldn’t only apply when they are this large.
It is a refreshing change from people hallucinating or asserting things not in the bill.
SB 1047 (3): Oh Anthropic
Here is Anthropic’s letter, which all involved knew would quickly become public.
Those worried about existential risk are rather unhappy.
Adam Gleave of FAR AI Research offers his analysis of the letter.
There is that.
Allow me to choose my words carefully. This is politics.
In such situations, there are usually things going down in private that would provide context to the things we see in public. Sometimes that would make you more understanding and sympathetic, at other times less. There are often damn good reasons for players, whatever their motives and intentions, to keep their moves private. Messages you see are often primarily not meant for you, or primarily issued for the reasons you might think. Those speaking often know things they cannot reveal that they know.
Other times, players make stupid mistakes.
Sometimes you learn afterwards what happened. Sometimes you do not.
Based on my conversations with sources, I can share that I believe that:
I want to be crystal clear: The rest of this section is, except when otherwise stated, analyzing only the exact contents written down in the letter.
Until I have sources I can use regarding Anthropic’s detailed proposals, I can only extrapolate from the letter’s language to implied bill changes.
What Anthropic’s Letter Actually Proposes
I will analyze, to the best of my understanding, what the letter actually proposes.
(Standard disclaimer, I am not a lawyer, the letter is ambiguous and contradictory and not meant as a legal document, this stuff is hard, and I could be mistaken in places.)
As a reminder, my sources tell me this is not Anthropic’s actual proposal, or what it would take to earn Anthropic’s support.
What the letter says is, in and of itself, a political act and statement.
The framing of this ‘support if amended’ statement is highly disingenuous.
It suggests isolating the ‘safety core’ of the bill by… getting rid of most of the bill.
Instead, as written they effectively propose a different bill, with different principles.
Coincidentally, the new bill Anthropic proposes would require Anthropic and other labs to do the things Anthropic is already doing, but not require Anthropic to alter its actions. It would if anything net reduce the extent to which Anthropic was legally liable if something went wrong.
Anthropic also offer a wide array of different detail revisions and provision deletions. Many (not all) of the detail suggestions are clear improvements, although they would have been far more helpful if not offered so late in the game, with so many simultaneous requests.
Here is my understanding of Anthropic’s proposed new bill’s core effects as reflected in the letter.
So #3 is where it gets confusing. They tell two different stories. I presume it’s #2?
Possibility #1 is based partly on this, from the first half of the letter:
The would be a clear weakening from existing law, and seems pretty insane.
It also does not match the later proposal description (bold theirs, caps mine):
That second one is mostly the existing common law. Of course a company’s stated safety policy will under common law be a factor in a court’s determination of reasonable care, along with what the company did, and what would have been reasonable under the circumstances to do.
This would still be somewhat helpful in practice, because it would increase the probable salience of SSPs, both in advance and during things like plaintiff arguments, motions and emphasis in jury instructions. Which all feeds back into the deterrence effects and the decisions companies make now.
These two are completely different. That difference is rather important. The first version would be actively terrible. The second merely doesn’t change things much, depending on detailed wording.
Whichever way that part goes, this is a rather different bill proposal.
It does not ‘preserve the safety core.’
Another key change is the total elimination of the Frontier Model Division (FMD).
Under Anthropic’s proposal, no one in California would be tasked with ensuring the government understands the SSPs or safety actions or risks of frontier models. No one would be tasked with identifying companies with SSPs that clearly do not take reasonable care or meet standards (although under the letter’s proposals, they wouldn’t be able to do anything about that anyway), with figuring out what reasonable care or standards would be, or even to ask if companies are doing what they promised to do.
The responsibility for all that would shift onto the public.
There is a big upside. This would, in exchange, eliminate the main credible source of downside risk of eventual overregulation. Many, including Dean Ball and Tyler Cowen, have claimed that the political economy of having such a division, however initially well-intentioned were the division and the law’s rules, would inevitably cause the new division to go looking to expand the scope of their power, and they would find ways to push new stupid rules. It certainly has happened before in other contexts.
Without the FMD, the political economy could well point in the other direction. Passing a well-crafted bill with limited scope means you have now Done Something that one can point towards, and there will be SSPs, relieving pressure to do other things if the additional transparency does not highlight an urgent need to do more.
Those transparency provisions remain. That is good. When the public gets this extra visibility into the actions of various frontier AI developers, that will hopefully inform policymakers and the public about what is going on and what we might need to do.
The transparency provisions would be crippled by the total lack of pre-harm enforcement. It is one thing to request a compromise here to avoid overreach, but the letter’s position on this point is extreme. One hopes it does not reflect Anthropic’s detailed position.
A company could (fully explicitly intentionally, with pressure and a wink, or any other way) rather brazenly lie about what they are doing, or turn out not to follow their announced plans, and at most face civil penalties (except insofar as lying sufficiently baldly in such spots is already criminal, it could for example potentially be securities fraud or falsification of business records), and only if they are caught.
Or they could skip all that, and simply file a highly flimsy safety plan that is not too effectively dissimilar from ‘lol we’re Meta.’ For examples of (subtle?!) what that looks like, see those submitted by several companies at the UK Safety Summit. Here’s Meta’s.
Anthropic would also explicitly weaken the whistleblower provisions to only apply to a direct violation of the plan filed (also to not apply to contractors which would open up some issues but there are reasons why that part might currently be a huge mess as written in the latest draft).
There would be no protection for someone saying ‘the model or situation is obviously dangerous’ or ‘the plan obviously does not take reasonable care’ if the letter of the plan was followed.
This substantially updates me towards ‘Anthropic’s RSP is intended to be followed technically rather than spiritually, and thus is much less valuable.’
The enforcement even after a catastrophic harm (as implied by the letter, but legal wording might change this, you can’t RTFB without a B) cannot be done by the attorney general, only by those with ordinary standing to bring a lawsuit, and only after the catastrophic harm actually took place, who would go through ordinary discovery, at best a process that would take years in a world where many of these companies have short timelines, and there is only so much such companies can ultimately pay, even if the company survived the incident otherwise intact. The incentives cap out exactly where we care most about reducing risk.
Anthropic knows as well as anyone that pre-harm injunctive enforcement, at minimum, is not ‘outside the safety core.’ The whole point of treating catastrophic and existential risks differently from ordinary liability law is that a limited liability corporation often does not have the resources to make us whole in such a scenario, and thus that the incentives and remedy are insufficient. You cannot be usefully held liable for more than you can pay, and you cannot pay anything if you are already dead.
But let us suppose that, for whatever reason, it is 2025’s session, SB 1047 did not pass, and this new bill, SB 2025, is the only bill on the table, written to these specifications. The Federal government is busy with other things. Perhaps the executive order is repealed, perhaps it isn’t, depending on who the President is. But nothing new is happening that matters. The alternative on the table is nothing.
Is it a good bill, sir?
Would I support it, as I understand the proposal?
I would say there are effectively two distinct bills here, combined into one.
The first bill is purely a transparency bill, SB 2025.1.
It says that every company training a $100m+ model must notify us of this fact, and must file an SSP of its policies, which could be anything including ‘lol we’re Meta.’
That is not going to ‘get it done’ without enforcement, but is better than nothing. It provides some transparency, allowing us to react better if something crazy is about to happen or happens, and provides help for any liability lawsuits.
Then the question is, which version of SB 2025.2, the second bill, are we getting?
If it’s possible interpretation #1, Claude confirmed my suspicions that this would do the opposite of its stated intent. Rather than hold companies more accountable, it would effectively reduce their liability, raise the bar for a successful lawsuit, potentially even providing safe harbor.
That is because there already exists the common law.
As in, if a company:
Then the victims can and will (if anyone was still alive to do so) sue the bastards, and will probably win, and often win very large punitive damages.
Why might that lawsuit not succeed here?
Claude pointed to two potential defenses, either a first amendment defense or a Section 230 defense, both unlikely to work. I am unable to think of any other plausible defenses, and I agree (although of course I am not a lawyer and never give legal advice) that those two defenses would almost certainly fail. But if they did work, those would be federal defenses, and they would override any California lawsuit or legal action, including those based upon SB 1047.
Whereas under the hypothetical SB 2025.2, first version, if you go by the statements earlier in the letter, and their clear intent, the lawsuit would now shift to the SSP, with a higher threshold for liability and a lower amount of damages than before.
And exactly Anthropic’s existing actions are that which would provide a measure of safe harbor. There is also some risk this could implicitly weaken liability for under $500 million in damages, although I’ve been told this is unlikely.
So in my judgment, as I extrapolate what the letter is implying, the proposed SB 2025.2, under that first possibility, would be actively harmful.
Details matter but probably that means one should oppose the full bill, on the grounds that it plausibly makes us on net less safe, even if the alternative was nothing.
If it’s possibility two, then my understanding is that 2025.2 becomes a clarification of the way existing law works. That could still be substantially helpful insofar as it ‘increases awareness’ or decreases chance of misinterpretation. We are counting on the deterrence effect here.
So the second version of the bill would be, if worded well, clearly better than nothing. If the only alternative was nothing (or worse), especially with no transparency or other help on the Federal level, I’d support a well crafted version of that bill.
You take what you can get. I wouldn’t fool myself that it was doing the job.
Anthropic’s employees and leadership are robustly aware of the stakes and dangers. If the details, rhetoric or broad principles here were a mistake, they were a wilful one, by a public relations and policy department or authorized representative that either does not wish to understand, or understands damn well and has very different priorities than mine.
Any policy arm worth its salt would also understand the ways in which their choices in the construction of this letter were actively unhelpful in passing the bill.
I presume their legal team understands what their proposals would likely do, and not do, if implemented, and would say so if asked.
Thus taken on its own, the letter could only be read as an attempt to superficially sound supportive of regulatory and safety efforts and look like the ‘voice of reason’ for public relations, while instead working to defang or sabotage the bill, and give a technical excuse for Anthropic to fail to support the final bill, since some requests here are rather absurd as presented.
The actual proposal from Anthropic is somewhat different.
If Anthropic does end up endorsing a reasonable compromise that also helps gain other support and helps the bill to become law, then will have been extremely helpful, albeit at a price. We do not yet know.
It is now important that Anthropic support the final version of the bill.
Until we know the final bill version or Anthropic’s proposed changes, and have the necessary context, I would advise caution. Do not jump to conclusions.
Do offer your feedback on what is proposed here, and how it is presented, what rhetorical ammunition this offers, and what specific changes would be wise or unwise, and emphasize the importance of endorsing a bill that presumably will incorporate many, but not all, of Anthropic’s requests.
Do let Anthropic know what you think of their actions here and their proposals, and encourage Anthropic employees to learn what is going on and to discuss this with leadership and their policy department.
Definitely do update your opinion of Anthropic based on what has happened, then update again as things play out further, we learn the final changes, and Anthropic (among others) either supports, does not support or opposes the final bill.
Anthropic benefits from the fact that ‘the other guy’ is either ‘lol we’re’ Meta, or is OpenAI, which has taken to acting openly evil, this week with a jingoistic editorial in the Washington Post. Whatever happens, Anthropic do clear the bar of being far better actors than that. Alas, reality does not grade on a curve.
I consider what happens next a key test of Anthropic.
Prove me wrong, kids. Prove me wrong.
Open Weights Are Unsafe and Nothing Can Fix This
There are sometimes false claims that SB 1047 would effectively ‘ban open source.’
It seems worth pointing out that many open source advocates talk a good game about freedom, but if given half a chance they would in many contexts… ban closed source.
For example, here’s Eric Raymond.
Why do they want to ban closed source? Because they believe it is inherently less safe. Because they believe it allows for better democratic control and accountability.
And in some contexts, you know what? They have a pretty damn strong argument. What would happen if, in a different context, the implications were reversed, and potentially catastrophic or existential?
Another place I get whiplash or confusion starts people correctly point out that LLMs can be jailbroken (see The Art of the Jailbreak, this week and otherwise) or are otherwise rendered unsafe or out of control, in accordance with the latest demonstration thereof. So far, so good.
But then those who oppose SB 1047 or other regulations will often think this is a reason why regulation, or safety requirements, would be bad or unreasonable. Look at all the things that would fail your safety tests and pose risks, they say. Therefore… don’t do that?
Except, isn’t that the whole point? The point is to protect against catastrophic and existential risks, and that we are not by default going to do enough to accomplish that.
Pointing out that our safeguards are reliably failing is not, to me, a very good argument for requiring safeguards that work. I see why others would think it is – they want to be able to build more capable AIs and use or release them in various ways with a minimum of interference and not have to make them robust or safe, because they aren’t worried about or don’t care about or plan to socialize the risks.
It feels like they think that jailbreaks are not a fair thing to hold people accountable for, the same way they don’t think a one-day fine tune of your open weights model should be your responsibility, so the jailbreak is evidence the proposed laws are flawed. And they say ‘CC: Scott Weiner.’
Whereas to me, when someone says ‘CC: Scott Weiner’ in this spot, that is indeed a helpful thing to do, but I would not update in the direction they expect.
If closed weight models are, by virtue of jailbreaks, less safe, that does not mean we should put less requirements on open models.
It means we need to worry more about the closed ones, too!
The Week in Audio
Patrick McKenzie interviews Kelsey Piper.
US-China AI competition on the 80,000 Hours podcast with Sihao Huang.
Vitalik Buterin on defensive acceleration and regulating AI on 80,000 hours. This includes saying the obvious, that the AI that ‘gets there first’ to certain goals might prove decisive, but it might not, it depends on how the tech and progress fall.
Rhetorical Innovation
Tyler Cowen unrelatedly reports he is dismayed by the degree of misinformation in the election and the degree to which people who should know better are playing along. I would respond that he should buckle up cause it’s going to get worse, including via AI, he should set the best example he can, and that he should reflect on what he is willing to amplify on AI and consider that others care about their side winning political fights the same way he cares about his side ‘winning’ the AI fight.
There are other reasons too, but it is good to be reminded of this one:
I would extend this to ‘and if you think you are an exception, you are probably wrong.’
Richard Ngo: Thoughts on the politics of AI safety:
I agree with a lot of this. I agree that setting up good ‘governance with AI’ is important. I agree that being able to respond flexibly and sensibly is very important. I agree that better AI tools would be very helpful.
I agree that building institutional knowledge and freedom of action is crucial. We need to get people who understand into the right places. And we desperately need visibility into what is happening, a key provision and aspect of the Executive Order and SB 1047.
We also need to be empowered to do the thing if and when we need to do it. It is no use to ‘empower decision making’ if all your good options are gone. Thus the value of potential hardware governance, and the danger of deployments and other actions of the kinds that cannot be taken back.
I also agree that better visibility into and communication with, and education of, NatSec types is crucial. My model of such types is that they are by their nature and worldview de facto unable to understand that threats could take any form other than a (foreign) (human) enemy. That needs to be fixed, or else we need to be ready to override it somehow.
I agree that the central failure mode here, where we get locked into a partisan battle or other form of enforced consensus, is an important failure mode to look out for. However I think it is often overemphasized, to the extent of creating paralysis or a willingness to twiddle thumbs and not prepare. As Tolkien puts it, sometimes open war is upon you whether you would risk it or not. If those who are actively against safety focus their efforts on one party, that does not mean you let them win to avoid ‘polarization.’ But I hear that suggestion sometimes.
Where I most strongly disagree are the emphasis on common sense and the assumption of ‘holy shit’ moments. These are not safe assumptions.
We have already had a number of what should have been ‘holy shit’ moments. Common sense should already apply, and indeed if you poll people it largely does, but salience of the issue remains low, and politicians so far ignore the will of the people.
The frogs are boiling remarkably rapidly in many ways. We suddenly live among wonders. ‘Very serious people’ still think AI will provide only minimal economic benefits, that it’s mostly hype, that it can all be ignored.
There are many highly plausible scenarios where, by the time AI has its truly ‘holy shit’ moment, it is too late. Perhaps the weights of a sufficiently advanced model have been irreversibly released. Perhaps we are already locked in a desperate race, with the NatSec types in charge, who consider likely doomsdays acceptable risks or feel they have no choice. Perhaps we create a superintelligence without realizing we’ve done so. Perhaps we get a sharp left turn of some kind, or true RSI or intelligence explosion, where it escalates quickly and by the time we know what is happening we are already disempowered. Perhaps the first catastrophic event is really quite bad, and cascades horribly even if it isn’t strictly existential, or perhaps we get the diamond nanomachines. Who knows.
Or we have our ‘holy shit’ moments, but the ‘common sense’ reaction is either to accelerate further, or (this is the baseline scenario) to clamp down on mundane utility of AI while accelerating frontier model development in the name of national security and innovation and so on. To get the worst of both worlds. And on top of that you have the scrambles for power.
What about common sense? I do not think we should expect ‘common sense’ decisions to get us through this except by coincidence. The ‘common sense’ reaction is likely going to be ‘shut it all down, NOW’ at some point (and the average American essentially already low-level thinks this) including the existing harmless stuff, and presumably that is not the policy response Richard has in mind. What Richard or I think is the ‘common sense’ reaction is going to go over most people’s heads, even most with power, if it has any complexity to it. When the scramble does come, if it comes in time to matter, I expect any new reactions to be blunt, and dumb, and blind. Think Covid response.
On polarization in particular, I do think it’s a miracle that we’ve managed to avoid almost all polarization for this long in the age of so much other polarization. It is pretty great. We should fight to preserve that, if we can. Up to a point.
But we can’t and shouldn’t do that to the point of paralysis, or accepting disastrous policy decisions. The rank and file of both parties remain helpful, but if JD Vance and Marc Andreessen are empowered by the Trump administration and enact what they say, they will be actively trying to get us killed, and open war will be upon us whether we would risk it or not.
It would suck, but I would not despair, aside from the actual policy impacts.
That is because if the polarization does happen, it will not be a fair fight.
If AI continues to improve and AI becomes polarized, then I expect AI to be a key issue, if not the key issue, in the 2028 elections and beyond. Salience will rise rapidly.
If that happens, here is a very clear prediction: The ‘pro-AI’ side will lose bigly.
That does not depend on whether or not the ‘pro-AI’ side is right under the circumstances, or what common sense would say. People will demand that we Do Something, both out of existential style fears and various mundane concerns. I would be stunned if a lot of the proposed actions of a possible ‘anti-AI’ platform are not deeply stupid, and do not make me wince.
One risk, if we try too hard to avoid polarization and regulatory actions are all postponed, is we create a void, which those who do not understand the situation would inevitably fill with exactly the wrong ‘anti-AI’ policies – clamping down on the good things, while failing to stop or even accelerating the real risks.
None of that takes away from the importance of figuring out how to wisely incorporate AI into human government.
Businessman Waves Flag
OpenAI CEO Sam Altman has written an op-ed in The Washington Post, “Who will control the future of AI?”
He wraps himself in the flag, says the options are Us or Them. Us are Good, you see, and Them are Bad.
So make it Us. Invest enough and I’ll make Us win. Fund ‘innovation’ and invest in our infrastructure to ensure the future belongs to ‘democracy.’
He ignores the most likely answer, which of course is: AI.
He also ignores the possibility we could work together, and not race to the finish line as quickly as possible.
Yes, there are various fig leafs thrown in, but do not kid yourself.
If the mask was ever on, it is now off.
Altman presents AI has a race between the West and Authoritarians, with the future depending on who wins, so we must win.
That is the classic politician’s trick, the Hegelian dialectic at work. No third options.
You should see the other guy.
His failure mode if AI goes badly? Authoritarian humans in charge. They will (gasp) force us to share our data, spy on their own citizens, do cyberattacks.
I mean, yes, if they could they would totally do those things, but perhaps this is not the main thing to be worried about? Old Sam Altman used to understand there was an existential risk to humanity. That we could lose control over the future, or all end up dead. He signed a very clear open letter to that effect. It warned of ‘extinction risk.’
Remember this guy?
There is a bunch of hopium in the full clip, but he asked some of the right questions. And he realized that ‘who has geopolitical power’ is not the right first question.
Indeed, he explicitly warned not to fall for that trick.
He now writes this letter instead, and does not even deem to mention such dangers. He has traded in attempts to claim that iterative deployment is a safe path to navigate existential dangers, to being a jingoist out to extract investment from the government.
That’s jumping a bit ahead. What is Altman actually proposing?
Four things.
He says ‘while minimizing [AI’s] risks’ as a goal but again he does not say what risks they are. Anyone reading this would think that he is talking about either ordinary, mundane risks, or more likely the risk of the bad authoritarian with an AI.
None of these core proposals are, in their actual contents, unreasonable, aside from the rather brazen proposal to transform the safety institutes into capability centers.
The attitude and outlook, however, are utterly doomed. Sam Altman used to at least pretend to be better than this.
Now he’s done with all that. A lot of masks have come off recently.
Vibe shift, indeed.
Meanwhile, as Sam Altman goes full jingoist and stops talking about existential risk at all, we have Ted Cruz joining JD Vance in claiming that all this talk of existential risk is a conspiracy by Big Tech to do regulatory capture, while those same companies fight against SB 1047.
Must be nice to use claims of a Big Tech conspiracy to defend the interests of Big Tech. It’s actually fridge brilliance, intentionally or otherwise. You raise the alarm just enough as a de facto false flag to discredit anyone else raising it, while working behind the scenes, and now out in the open, to say that everything is fine.
Businessman Pledges Safety Efforts
Sam Altman also made an announcement on safety, with three core claims.
I notice that the original commitment said it was ‘to superalignment’ and now it is ‘to safety efforts’ which includes mundane safety, such as provisions of GPT-4o. That is a very different commitment, that you are pretending is remotely similar.
I notice that this does not say that you actually allocated that compute to safety efforts.
I notice you certainly aren’t providing any evidence that the allocations happened.
I notice that we have many reports that the former superalignment team was denied compute resources time and again, given nothing remotely like what a 20% commitment implied. Other things proved more important to you. This drove your top safety people out of the company. Others were fired on clear pretexts.
So, yeah. That is not what you promised.
If you’re saying trust us, we will do it later? We absolutely do not trust you on this.
On the plus side, I notice that the previous commitment could have reasonably been interpreted as 20% of ‘secured to date’ compute, meaning compute OpenAI had access to at the time of the commitment last July. This is worded strangely (it’s Twitter) but seems to strongly imply that no, this is 20% of total compute spend. As they say, huge if true.
I notice that GPT-4o went through very little safety testing, it was jailbroken on the spot, and there were reports its safety measures were much weaker than typical. One could reasonably argue that this was fine, because its abilities were obviously not dangerous given what we know about GPT-4, but it did not seem like it was handled the way the Preparedness Framework indicated or in a way that inspired confidence. And we presume it was not shared with either the US AI Safety Institute, or the UK AI Safety Institute.
I notice there is no mention here of sharing models with the UK AI Safety Institute, despite Google and Anthropic having arranged to do this.
It is excellent news that OpenAI is ‘working on an agreement’ to provide early access to at least the US AI Safety Institute. But until there is a commitment or agreement, that is cheap talk.
I’ve reported on the NDA and non-disparagement situation extensively, and Kelsey Piper has offered extensive and excellent primary reporting. It seems fair to say that OpenAI ‘worked hard to make it right’ once the news broke and they faced a lot of public pressure and presumably employee alarm and pushback as well.
It is good that they as Churchill said did the right thing once they exhausted all alternatives, although they could do more. Much better than continuing to do this. But even if you buy the (to put it politely) unlikely story that leadership including Altman did not knowingly authorize this and had for a long time no idea this was happening, they had months in which Kelsey Piper told them exactly what was happening, and they waited until the story broke to start to fix it.
It is also good that they have acknowledged that they put into contracts, and had, the ‘right’ to cancel vested equity. And that they agree that it wasn’t right. Thank you. Note that he didn’t say here they no longer can cancel equity, only that they voided certain provisions.
What about the firing of Leopold Ashenbrenner, for raising security concerns? Are we going to make that right, somehow? If we don’t, how will others feel comfortable raising security concerns? The response to the Right to Warn letter was crickets.
Given the history of what has happened at OpenAI, they need to do a lot better if they want to convince us that OpenAI wants everyone to feel comfortable raising concerns.
And of course, Sam Altman, there is that op-ed you just wrote this week for the Washington Post, covered in the previous section. If you are committed to safety, if you even still know the things you previously have said about the dangers we are to face, what the hell? Why would you use that kind of rhetoric?
Aligning a Smarter Than Human Intelligence is Difficult
Thomas Kwa offers notes from ICML 2024 in Vienna. Lots of progress, many good papers, also many not good papers. Neel Nanda notes many good papers got rejected, because peer review does a bad filtering job.
Aligning a Dumber Than Human Intelligence is Also Difficult
So often problem exists between keyboard and chair, here with a remote and a car.
Have you ever trusted a human without personally calibrating them yourself in every edge case carefully multiple times?
That question is close but not quite identical to, ‘have you ever trusted a human’?
Humans are highly unreliable tools. There are tons of situations, and self-driving cars are one of them, where if the AI or other computer program was as reliable as the average human who actually does the job, we would throw the computer out a window.
Another great example is eyewitness testimony. We would never allow any forensic evidence or AI program anywhere near anything that mattered with that kind of failure rate, even if you only count witnesses who are trying to be honest.
As a practical matter, we should often be thankful that we are far less trusting of computers and AIs relative to similarly reliable humans. It leads to obvious losses, but the alternative would be worse.
Even with this bias, we are going to increasingly see humans trust AIs to make decisions and provide information. And then sometimes the AI is going to mess up, and the human is going to not check, and the Yishans of the world will say that was stupid of you.
But that AI was still, in many cases, far more reliable in that spot than would have been a human, or even yourself. So those who act like Yishan advises here will incur increasing costs and inefficiencies, and will lose out.
Indeed, Simeon points us down a dangerous road, but locally he is already often right, and over time will be increasingly more right. As in, for any individual thing, the AI will often be more trustworthy and likely to be right than you are. However, if you go down the path of increasingly substituting its judgment for your own, if you start outsourcing your thinking and taking yourself out of the loop, that ends badly.
When the mistakes start to have much higher consequences, or threaten to compound and interact and spiral out of control? And things go horribly, potentially catastrophically or existentially wrong? It is going to be tricky to not have people ask for this, quite intentionally. With their eyes open.
Because yeah, the AI has some issues and isn’t fully reliable or under your control.
But have you seen the other guy?
Other People Are Not As Worried About AI Killing Everyone
Arvind Narayanan and Sayash Kapoor warn that probability estimates of existential risk are unreliable, so you shouldn’t ‘take them too seriously.’ Well, sure.
You should take them exactly the right amount of seriously, as useful ways to discuss questions that are highly imprecise. I mostly only have people in about five buckets, which are something like:
That does not make them, as this post puts it, ‘feelings dressed up as numbers.’
They are perspectives on the problem best communicated by approximate numbers.
I don’t think this is mostly what is going on, but in cases where it is (that mostly are not about AI at all, let alone doom, I’d bite the full bullet anyway: Even when what you have is mostly a feeling, how better to express it than a number?
In this case, a lot of people have put a lot of thought into the question.
A lot of this was essentially an argument against the ability to assert probabilities of future events at all unless you had lots of analogous past events.
They even say ‘forecast skill can’t be measured’ which is of course absurd, track records are totally a thing. It is an especially remarkable claim when they specifically cite the predictions of ‘superforecasters’ as evidence.
I was disappointed but not surprised – because calibration and forecasting skill can indeed be improved over time – by such general arguments against knowing probabilistic things, and to see Pascal’s Wager invoked where it does not belong. Alas, the traditional next thing to say is ‘so we should effectively treat the risk as 0%’ which for obvious reasons does not follow.
I was similarly disappointed but not surprised by those who praised the post.
Here are some others explaining some of the errors.
The Lighter Side
Anthropic wants your business.
I do think the aesthetics are solid.
The problem with the Claude advertisements is that no one knows what Claude or Anthropic is, and these ads do not tell you the answer to that. It is a strange plan.
Other ads are less subtle.
I think that’s mostly just how advertising works? Here there seems a genuine need. Anyone watching the Olympics likely has little idea what AI can do and is missing out on much mundane utility. Whereas with crypto, there were… other dynamics.
So yes, it is what would sometimes happen in a bubble, but it is also what would happen if AI was amazingly great. What would it say if AI companies weren’t advertising?
Everyone is partly correct.
Oh no, that’s the top.