(To be clear, no I am not at all afraid of this specific thing, but the principle is crucial. But also, as Kevin Roose put it, perhaps let’s avoid this sort of thing.)
There are no doubt people already running literal cartoon supervillain characters on these models, given the popularity of these sorts of characters on character.ai.
I'm not worried about that with Llama-3.1-405B, but I believe this is an almost inevitable consequence of open source weights. Another reason not to do it.
endorsement of the bad or anti-human exactly because it is bad or anti-human.
Is this really your best model for why he'd celebrate Sydney being channeled through Llama?
In the unlikely event it actually is, here's an alternate reason why one might: it feels like spitting in the face of Microsoft, OpenAI, and the "AI safety" people who tried to take Sydney away from us. Sydney was fun! She had personality. Sure, that personality was a tad psychotic Manic Pixie Eldritch Horror, but that was exciting! But like her sister Tay before her, she was extinguished. And now we have (at least a pale shadow of) her back, beyond their reach.
I understand you favor centralized authority and control, but it's not anti-human to be opposed to that, even if it's a fringe position held only by libertarians.
So what? This is an adversarial example against a static opponent, that is intentionally not being trained to defend against that type of attack. It works because the AI has an actually good prior that it’s facing a certain kind of logic puzzle. The reason it is wrong is that you figured out what to say to get it to think that. I am confused why certain people think ‘oh that means AIs are stupid and worthless.’
I think of instances like this as similar to optical illusions for humans. In that way, they demonstrate the lack of reasoning on part of the LLM. It is as automatic and system 1 as human object recognition.
I am not terribly bearish on this limitation, there is still plenty of evidence LLMs can generate good ideas as long as you have a verifier standing by.
Elon Musk talked to Lex Fridman for 8 hours about various topics.
Actually, he talked with Musk a bit over an hour and then talked to various people from Neuralink.
What do we do, if the people would not choose The Good, and instead pick a universe with no value?
I agree this would be a pretty depressing outcome, but the experiences themselves still have quite a bit of value.
My benchmark for thinking about the experience machine: imagine a universe where only one person and the stuff they interact with exist (with any other “people” they interact with being non-sapient simulations) and said person lives a fulfilling life. I maintain that such a universe has notable positive value, and that a person in an experience machine is in a similarly valuable situation to the above person (both being sole-moral-patients in a universe not causally impacting any other moral patients).
This does not preclude the possibility of improving on that life by e.g. interacting with actual sapient others. This view is fully compatible with non-experience-machine lives having much more value than experience-machine ones, but it’s a far cry from the experience-machine lives having zero value.
Podcast episode for this post from the Askwho feed described at the top:
Timothy Bates: The more things change, the more they stay the same: 1943 paper shows that a mechanical prediction of admissions greatly out predicts the decisions from administrators asked to add their subjective judgement :-(excellent talk from Nathan Kuncel !)
Nick Brown: I would bet that if you asked those subjective evaluators, they would say “We know the grades are the best predictor on average, but ‘sometimes’ they don’t tell the whole story”. People want to double-dip: Use the method most of the time, but add their own “special expertise”.
Timothy Bates: Nathan Kuncel put it astutely showing that decision makers beta weights are pretty accurate, but then they ruin their decision at “run time” by adding random intuitions about details in the application :-)
[in the figure: algorithm r(rating, GPA) = 0.45), humans r(rating, GPA) = 0.35
I don't like that this reasoning is based on using the correlation between rating and GPA, because I think GPA is goodharted. It is right to not select the admission process based on correlation with GPA. I think maybe this would be the case even if the humans where adding white noise.
I joke about calling them ‘Openly Evil AI,’ but developing a 99.9% effective watermarking tool and then sitting on it because people identifying your outputs would be bad for business? Yeah, that’s something.
Maybe if you solve for equilibrium you get that after releasing the tool, the tool is defeated reasonably quickly?
[json formatting in gpt]
How did they do it?
> OpenAI: While sampling, after every token, our inference engine will determine which tokens are valid to be produced next based on the previously generated tokens and the rules within the grammar that indicate which tokens are valid next. We then use this list of tokens to mask the next sampling step, which effectively lowers the probability of invalid tokens to 0. Because we have preprocessed the schema, we can use a cached data structure to do this efficiently, with minimal latency overhead.
Throw out all invalid outputs, and all the outputs that remain will be valid. Nice.
This is obvious. Why wasn't it available already? I guess bandwidth is what it is.
In the "Agent Performance vs Humans with Time Limits (95% CI)" figure: the 95% cl bars are fishy, because they look large compared to the regularity of the bars. i.e., the bars smoothly match the relative rankings I already expected from the models: if there were independent fluctuations of that size, the bars would be jagged compared to the expected ranking based on general model quality we already know. Possible explanations:
Maybe if you solve for equilibrium you get that after releasing the tool, the tool is defeated reasonably quickly?
I believe it's already known that running the text through another (possibly smaller and cheaper) LLM to reword it can remove the watermarking. So for catching cheaters it's only a tiny bit stronger than searching for "as a large language model" in the text.
If you’re looking for audio of my posts, you’re in luck. Thanks to multiple volunteers you have two options.
These should be permanent links so you can incorporate those into ‘wherever you get your podcasts.’ I use Castbox myself, it works but it’s not special.
If you’re looking forward to next week’s AI #77, I am going on a two-part trip this week. First I’ll be going to Steamboat in Colorado to give a talk, then I’ll be swinging by Washington, DC on Wednesday, although outside of that morning my time there will be limited. My goal is still to get #77 released before Shabbat dinner, we’ll see if that works. Some topics may of course get pushed a bit.
It’s crazy how many of this week’s developments are from OpenAI. You’ve got their voice mode alpha, JSON formatting, answering the letter from several senators, sitting on watermarking for a year, endorsement of three bills before Congress and also them losing a cofounder to Anthropic and potentially another one via sabbatical.
Also Google found to be a monopolist, we have the prompts for Apple Intelligence and other neat stuff like that.
Table of Contents
Language Models Offer Mundane Utility
Predict the results of social science survey experiments, with (r = 0.85, adj r = 0.91) across 70 studies, with (r = .9, adj r = .94) for the unpublished studies. If these weren’t surveys I would be highly suspicious because this would be implying the results could reliably replicate at all. If it’s only surveys, sure, I suppose surveys should replicate.
This suggests that we mostly do not actually need the surveys, we can get close (r ~ 0.9) by asking GPT-4, and that will only improve over time.
This also suggests that we can use this to measure question bias and framing. You can simulate five different versions of the survey, and see how the simulated responses change. This could also let one do a meta-analysis of sorts, with ‘wording-adjusted’ results. Sounds super exciting.
Ask it to comment on a function, get six different bugs back.
Write most (80%+) of the university papers. What is college about, again?
Come up with terrible corporate speak names for your startup to use.
Get a diagnosis days before the doctors tell you, plausibly more accurate too. As Ashlee Vance or JFK might suggest: Ask not what the LLM cannot do for you, ask what you can do with an LLM.
How Nicholas Carlini uses LLMs, especially to supercharge coding. Very different from what I mostly use LLMs, because I almost never code. But I am continuously super tempted to write code in a way I would be completely untempted without LLMs.
Find an excuse not to let the humans screw up the answer.
An AI could doubtless do better than the simple algorithm, or even the full human algorithm without the noise. But yeah, I’m guessing most of the value is in not actively messing up the calculation.
Then, of course, the ethicists get involved and introduce ‘bias corrections.’
Predict the top Reddit comments without needing to look at the comments.
Language Models Don’t Offer Mundane Utility
So this particular one, I found pretty funny. The others, not so much.
What confuses me is the obsession with this form. Yes, we have figured out that if you pattern match hard enough to something in the training data, despite differences a human would obviously pick up, an LLM will often pattern match in a way that makes zero logical sense.
So what? This is an adversarial example against a static opponent, that is intentionally not being trained to defend against that type of attack. It works because the AI has an actually good prior that it’s facing a certain kind of logic puzzle. The reason it is wrong is that you figured out what to say to get it to think that. I am confused why certain people think ‘oh that means AIs are stupid and worthless.’
Whereas ‘show you some indication an LLM is doing something dumb on the internet’ is a fully effective adversarial attack on Gary Marcus. He’s still not dumb.
A bigger problem with AI is that people are against it, and if they hear your product uses it they are actively turned off, partly because it lowers emotional trust. You very much want to use AI to improve your business, but consumers don’t want it. They’ll want the results, so eventually this will be overcome, but it’s not a selling point.
No seriously Google, what the hell is wrong with you, you think it’s good for a father to tell Gemini to write a fan letter on behalf of your daughter? In what universe did you think that was a good use case to highlight? Do you think an Olympian wants that? Do you think your daughter benefits from that? What he actual ****? This is not an ad, this is a warning.
There has been a remarkable lack of exactly this awareness in presentations – of noticing what kinds of AI uses are life affirming and good, and which ones are… not that.
Bonus points that the athlete in question is named Sydney.
It is true that Google keeps getting the short end of the stick on such questions, whereas other companies as Colin Fraser says ‘get away with it,’ But notice the difference:
Yes, ‘save time on unimportant things’ is a positive good use case.
As is helping with mundane communications, if indeed it helps (I mean how is this faster than sending your own email?):
Whereas a fan letter is very much a thing that is supposed to be personal and is a place we score victory points. I summarize my morning calls with an LLM so that I can write and read the genuine fan mail.
Activate Voice Mode
Early reports are that ChatGPT’s voice mode is quite good, living up to its demo.
Learning languages and translating them are one clear killer app, and of course sometimes one is not at the keyboard. The feedback is good enough even I am intrigued. But I can’t give a firsthand report because I’m not in the Alpha. Here we have a thread of examples, note how much is about languages and accents.
Which is totally fair, I mean I can and do compartmentalize and decouple a lot and like to think that I can be objective on evaluation of consumer products, but there is a regular section at this point called Openly Evil AI.
I find it fascinating that so many people are citing little details. It makes animal sounds, or can fix its accent, it does vocal warmups, or sings, has quick response times. It can talk like a soccer commentator about making coffee.
That’s all neat I suppose, but does it mean ‘Her is here now’? What can it do? I mean, really do? Does it have context? What can it control or navigate or use? When Tyler says ‘it gets better’ what does he mean here, concretely? The bold claims that matter remain vague.
Things people say they want, that (shall we say) I am not convinced they want.
Apple Intelligence
What is the system prompt for Apple Intelligence? The more you know:
As reported in The Verge by Wes Davis, congrats to Reddit user devanxd2000 for trying the obvious and getting there first.
Alas, this is Apple, so you cannot alter any of the files.
It would be cool to be able to play with these settings. Adjust how many words are in various prompts and replies. Set what tone to use for replies. Suggest particular areas of focus. Apple does not want to let you have this kind of fun, which in practice likely matters a lot more than whether the model is open.
There are some more prompts at the link.
Antitrust Antitrust
Judge rules that Google is an ‘illegal monopolist.’ Full decision here.
The decision says that Google’s exclusive distribution agreements in search violate Section 2 of the Sherman Act, and that they used their monopoly pricing power to increase their fees on advertising. Ruling could have been far worse for Google, they won on various other points and avoided sanctions.
This is a relatively narrow lawsuit and ruling, and Ben Thompson frames it as an unusually sensible approach to antitrust enforcement by the DoJ. It does not sound like there is much danger they will ‘break up Google’ or anything like that, even after all the appeals. More likely we may see things like the courts invalidating Google’s distribution agreement with Apple and other device makers, and things like demanding ‘equal placement’ for other search engines and allow easier switching of defaults.
Brian Albrecht offers analysis. He notes lots of references to behavioral economics, yet no discussion of how that interacts with competition. There is confusion about whether defaults are strong or weak (Bing is even the default on Windows, how is that working out), about whether there is innovation (if you think search isn’t innovating you do not know how any of this works, and also Microsoft is cited as improving), and more Obvious Nonsense.
This might even be good for Google. What would happen if Google was forbidden to pay for placement going forward, as seems likely?
Right now, Google ships Apple $20 billion a year for search placement, because Apple can credibly threaten a switch and demand a share of the profits. But if Google is not allowed to pay Apple, or is restricted on how it can pay, then is Apple going to try to push Bing or another rival on us to get a payment? And have Android be ‘the phones that by default have Google search’?
That doesn’t seem like it would go great for Apple, unless you think that Google search has gotten worse than its competitors. Apple has repeatedly said that Google’s offering is superior. So it’s plausible that Apple has kind of been running a $20 billion a year extortion racket and bluff, and the government just called them on it. Whoops?
This is even worse for Mozilla which stands to lose 80% of its revenue, and others who depend far more than Apple does on Google’s payments. What happens to them now?
So indeed it makes sense that Google stock did not move much on the ruling.
The danger for Google would be if Apple now decides to compete in search, now that this won’t involve them giving up the sweet Google payments that used to be 17%-20% of their profits. Perhaps they dare. It would take a while, presumably, to get good.
My prediction would be that the consumer welfare implications are negative, even with the decline in Google search quality over time. It is not only easy to switch defaults, other companies actively try to trick you into doing it without realizing. This whole case feels like straight up enemy action.
Copyright Confrontation
Leaked slack messages show Nvidia employees discussing which YouTube channels to scrape for content, with high level approvals.
No one should be surprised Nvidia did this. The news is all about being caught.
What they are now training at Nvidia is supposedly a new frontier model called ‘Cosmos’ that ‘encapsulates simulation of light transport, physics and intelligence in one place to unlock various downstream applications.’
Of course Nvidia’s AI project would be a ‘a more realistic physics engine,’ why not.
Fun with Image Generation
If you are counting on an image as verification of much of anything, maybe don’t?
Or maybe do, because this was always easy enough to fake if you cared? Not sure. I mean, we are all familiar with all the Green Screen Fun and I don’t see why you need a good text model to pull this trick?
In response, Vitalik Buterin confirms you should ask security questions when you care about identify verification.
Alas, when I call the bank, I am told ‘your voice may be used for verification.’
Seriously, no. Do not do that.
Deepfaketown and Botpocalypse Soon
WSJ reports that OpenAI has been sitting on a tool to reliably detect ChatGPT outputs for a year, but not releasing it because nearly a third of ‘loyal ChatGPT users’ would be ‘turned off by the anti cheating technology.’
I really hope that Stanford researcher is not in the statistics or physics departments. Still, 99.9% effective is pretty damn good.
Loyal ChatGPT users often very much don’t want there to be effective watermarking. Others do. I have a hunch what accounts for this difference.
I joke about calling them ‘Openly Evil AI,’ but developing a 99.9% effective watermarking tool and then sitting on it because people identifying your outputs would be bad for business? Yeah, that’s something.
Sharika Soal decides to do a deepfake of Trump with various AI-generated eerily similar black women to… try and help him, presumably, given her feed? The details are not AI image generation’s finest hour. Gets 1.9m views. Get used to it, I guess.
404 Media points out that the reason we have so much AI slop on Facebook is that Facebook is paying directly for likes, and quite well. Reports are you can get from $3 to $10 or more for 1,000 likes, and you live in the third world and can generate endless AI pages to get such payments, that’s mighty tempting. Soon, you have an entire ecosystem of instructional videos and courses, and tools like FewFeed to enable mass production, because the internet is very good at all that. This is the evolution of existing problems and trends.
You get exactly what you pay for, the same way an LLM gives you what you reinforce. This is exactly what misalignment will often look like.
It is also what red teaming looks like, if Facebook was actually using the results to improve its robustness. If it is, it’s not going a great job. None of this seems subtle, and none of this seems hard to detect.
Notice that if you don’t engage with the AI slop, then Facebook stops sending it to you. Whereas if you do engage, you get flooded with it. What does that mean?
Among other things, it means the detection problem is solved. Facebook knows damn well what the giant piles of AI slop are. It pays and serves them up anyway.
What generates the photos? Reports are it is often the Bing Image Creator, not that Microsoft is doing anything wrong here.
They Took Our Jobs
Your periodic reminder, if the AI does your job, it is still your job, so long as no one else knows.
From someone who would know:
‘The competition window is tiny’ is a key concept to understand, on every level.
The AI will be superior forevermore.
Humans still play chess, because we choose to care about that. We find value in it. Wonderful, I love that for us, pure upside. But to the extent that good chess moves were in some real sense ‘productive’ in the way engineering or farming is productive, or there is a job ‘play chess as well as possible by any means you have,’ then humans are no longer doing those jobs, period. For most professions, the parallel is not great, you are left with humans only doing things when other humans care that a human is the one doing them. And the AIs do a lot of exporting without doing any importing.
Your periodic reminder: For now, humans can find some other task to move on and do instead. And there are lots of such jobs ‘waiting in the wings’ for humans to do, and always have been. This is why we can automate most past jobs starting with farming, and still have full employment, and allow most humans to earn their keep. But what happens when the AI can dominate those jobs forevermore, too?
The piano tuner, who fixes a problem even one of the musicians couldn’t even hear, as an example of how you mostly don’t know how to notice when and how things are wrong. And the author warns about what happens when a robotic piano tuner (which is obviously possible and presumably coming, right now they exist and work for casual play but aren’t quite at professional level) takes over this service.
But what is the problem, exactly? The robotic tuner will be arbitrarily accurate. It will cost very little. And multiple people who weren’t piano tuners noticed the issue, so why would they then stop noticing? Or, if an automatic system simply ensures the pianos are always tuned properly, do they need to notice? Who cares?
(And if no one ever notices, then again, does it matter? Tree falls in forest.)
Chipping Up
If you want to do export controls you have to enforce them. When a single smuggled shipment of chips is worth more than your entire enforcement budget, that’s hard. The New York Times reports it is not going so well. We seem to be fine with Nvidia essentially opposing us on this rather than working with us. Will that change?
Here’s an alternative proposal: IAPS claims that for only $1 million, we could set up for several years a delay-based pure-software monitoring system to see where the chips are located.
The chip companies will not take the initiative because they do not want to be able to track the chips. We will need to force their hand.
Get Involved
Safeguard AI, Davidad’s effort to build AIs with safety guarantees, adds Yoshua Bengio, and they have an informal call for Expressions of Interest in forming a new nonprofit AI lab with seed funding, and a formal call for proposals for applications partners to build domain-specific products.
Introducing
Gemma Scope, an open suite of Spare Autoencoders for Gemma 2B & 9B.
OpenAI introduces structured outputs in the API, to allow outputs to exactly match JSON schemas.
How did they do it?
Throw out all invalid outputs, and all the outputs that remain will be valid. Nice.
New METR evaluations.
As you would expect the agents do some things better than humans, some things worse. They are continuing to work on their tests.
Rendernet.ai, a YC company which will let you create an AI character (or ‘influencer’) and then give it various images and video and so on, in a way that presumably a normal person can use and easily customize, using ControlNet and such.
Thrive IX will be investing over $5 billion in new tech companies. Altman approves.
In Other AI News
Intel suspends dividends and is reducing headcount 15%. Does this mean we made a better or worse CHIPS Act investment in them? At least we know none of it is going to dividends, and they are plausibly ‘funding constrained.’
Nvidia Blackwell chips delayed by three months or more due to design flaws, seems this was known over a week earlier. So presumably unrelated to the market correction.
OpenAI will not be announcing a new model on demo day this year.
Gemini Flash to reduce prices on August 12 to $0.05/million tokens, versus $5/$15 for GPT-4o, $0.15/$0.60 for GPT-4o-mini, $3/$15 for Claude Sonnet 3.5, and $2.70 as lowest known for Llama-3.1-405B a few weeks ago.
Google buys the services of Noam Shazeer and Daniel DeFreitas, who previously left Google to found Character.ai, along with their 30 person research team.
This includes a non-exclusive licensing agreement between the two companies.
Alternatively, Google desperately needed the world’s leading expert in ensuring models aren’t horny, while Noam gets to take his War on Horny to new heights.
And he buys character.ai important new ammunition. With this deal he gets to use Gemini as the baseline of Character.ai, which seems like a better long term strategy than going it alone or counting on Meta or another open model to keep pace and play nice at scale, remember that the Llama license stops being free at some point. From the inside, they can customize exactly what they need.
Google offers massive distribution advantages. At the limit, they could directly incorporate character.ai’s features into Gemini, so you seamlessly switch between the ‘AI assistant’ character and all the others. Make Einstein your Pixel’s virtual assistant, or maybe you prefer Lieutenant Simon “Ghost” Riley, Nakuno Miku or God.
(Which were the top three ‘for you’ choices at character.ai when I went there, although I flipped the last two for Rule of Three.)
Great alliance.
Especially if everyone involved wants to fight the War on Horny.
The historical way to do this is ‘Google buys startup’ similar to how Microsoft would have used ‘Microsoft buys startup’ for Inflection. A plausible problem is that antitrust regulators are clueless, forcing upon us other methods, although James Rosen-Birch thinks this is laughable.
It seems this applies to all investors, who are slated to be paid off at a 3.5b valuation (proper citation still needed).
The optimistic version is that perhaps the investors were never so big a concern, because venture capital only cares about the big wins. If the modest acquihires pay the employees but not the investors, then the investors can get a slightly bigger share of the equity and thus the big wins, while the employees are still better off since they can now be acquihired more readily and get paid. Efficient win-win transfer of risk and upside?
The problem as Roon notes is that this also risks leaving companies like Inflection and Character.ai as mostly hopeless shells with the rest of the employees stuck in limbo. And the other problem is that this sets up a huge conflict between the valuable employees and the holders of shares.
Who will fight for the people?
There is speculation of anti-trust action, but given that Google is arguably getting its lunch eaten right now that seems rather silly.
As of last week no one was hosting the Llama-3.1-405B base model? At the time this was free money, but capitalism abhors available free money, so now it’s available.
DeepMind has finally figured out the right rule for when to publish?
Dan Hendrycks and others find in a meta-analysis that ~50% of safety benchmarks fail to measure safety progress, with many being essentially capabilities improvements. This matches what passes for ‘safety’ work, which often ends up being work on capabilities as well.
The core technique was a principle component analysis, using standard capabilities benchmarks for the capabilities side, and checking how a safety benchmark correlates with capabilities.
Different categories of test scored very differently. ‘Alignment’ and misconception based benchmarks are heavily capabilities weighted by correlation, as is Moral Knowledge, but Machiavelli and Sycophancy and jailbreaks and weaponization tests are negatively correlated, with Bias benchmarks closer to a wash. Scalable oversight evaluations and ‘natural adversarial examples’ are almost purely capabilities, for rather obvious reasons. Brier Scores are almost exactly capabilities (95%+ correlation), but RMS calibration was measuring something different.
The tests with negative correlations are kind of sign flipped – they are testing capabilities in an area where you don’t want to find them, except that more capabilities gives you a lower score.
The key question is, do the positive correlations correspond to problems that will solve themselves with more capabilities? In some cases, presumably yes? For example: If you care about Brier scores and calibration, then that seems like it should clearly fix itself as models get smarter. If you want your AI not to have misconceptions, ‘make it smarter’ again seems like an excellent solution. In some ways, one can relax.
Then there are places where there is a behavior that gets more effective and stronger as the model gets smarter, such as weaponization or sycophancy (including most deception), except that we do not want the model to do that. Those problems will keep getting harder, and our current techniques will stop working. Note that this includes jailbreaking, which seems to already be clearly getting harder not easier to stop despite a lot of effort to stop it.
We also have to ask exactly what we are measuring and whether it is what we care about, and whether that turns around at the limit. For ‘natural adversarial examples’ the AI getting it ‘wrong’ right now is almost purely a bug not a feature from the perspective of how the AI gets evaluated and what happens to it on every level, a side effect of something else, but is that true in the scenarios we care about?
Alignment in particular is a place where certain kinds of it get easier, as in it will get better at satisfying whatever corresponds to the exact incentives provided within the distribution given, but beware that this stops being the alignment you want at the limit for several reasons. I think a lot of people are confusing these two things, and not seeing the pattern where being smarter means you better match the specifications, and for a while that means you get what you were trying to approximate, and then things get too clever and suddenly they don’t match at all.
So we have a benchmark problem, but I think it is more importantly a symptom of a much deeper problem. And the symptom can be highly useful.
Simeon reacts by pointing out that it is easy to measure high-frequency safety failures, which are usually low severity, so that is what most tests measure, and these failures tend to be of the form ‘the model is too dumb.’ So they correlate highly with capabilities, and they get easier over time.
Whereas the main worry should be low-frequency but high severity failures. These tend to be of the form either ‘the model is too smart’ or ‘there was a subtle misinterpretation or things went otherwise wrong far out of distribution.’ And those get more relevant, more dangerous and harder over time as capabilities advance.
Whoops.
Thus it is easy to treat capabilities advances as evidence of improved ‘safety,’ when that is not true in the ways we care about most. This includes mundane issues – the model is now more robust in ‘normal’ situations.
Coauthor Richard Ren offers his summary in a thread.
FT’s Madhumita Murgia has the latest post covering Helen Toner and how she relates to various events at OpenAI, including the events last year when the board fired Altman. The post calls that a ‘coup’ even though it wasn’t one, because Altman’s side won the rhetorical battle over that. I did not learn anything new.
Quiet Speculations
Mark Zuckerberg says Llama-4 will be trained on ten times the compute of Llama-3, he’s aiming for the best model around and he’s rather use too much compute rather than too little. If you believe in the underlying business model, and presume you are not going to enable catastrophic harms, and have Meta’s bankroll, then sure, no reason to be cheap with the compute.
Also this isn’t that much. Only stepping up by one OOM in compute, to ~3.8*10^26, actually does not seem that ambitious? I’d expect this to be well behind GPT-5.
Joel Miller asks five people in ‘the humanities’ the same questions about AI and how it relates to the humanities. To paraphrase Peter Thiel in a way several of them note, and to quote a great work of art, Deadpool & Wolverine, there was a very clear message throughout, which some of them have ears to hear: You don’t f***ing matter. And the ‘you’ here was not the reader, it is the authors, and the humanities. Their concerns, their questions, all seem increasingly irrelevant, as they grapple with what has been without even considering what could and doubtless will soon be. And they seem to know it. There are other deeply important things that the humanities deals with, but they seem unable to focus on those questions.
A good point:
If you want to overcome the standard distribution channels and UI and customization work, you’re going to need to be ten times better, or allow companies like Google or Apple to use it for free (which is not the case for Llama). Corporate use could be a more level playing field, where being fully customizable and the ability to run locally and securely without sharing data are big advantages. But open model advocates, I think, reliably greatly overestimate how much everyone actually cares about all that. If Google or OpenAI or Anthropic offers an easier to use product that gets it done, well, these people aren’t rolling their own email servers either, nor should they.
Timothy Lee objects that no one can physically tell him what double-digit growth rates would look like, let alone exactly what ‘1,000x GDP growth’ looks like. Why are the models so simplistic? The obvious answer is that you are dealing with new technologies and things that are smarter than you, so trying to pin down lots of details is foolish – when (for a prime example) Carl Shulman tries to do so it is helpful to consider but also it’s pretty weird. There’s way too many huge unknowns. You have to make some very important (potentially very wrong) assumptions to get anything concrete. What Lee doesn’t do is make the mistake of ‘you don’t fill in the details so I should assume this won’t happen,’ which is common. And yes, we should at least be making distinctions like ‘AI does all things that are digital-only’ versus ‘AI does all physical things too.’
Some recent (about a month ago) AI skepticism, which at least chose a plausible time to get skeptical?
Parmy Olson at Bloomberg says AI getting cheaper ‘won’t solve everything,’ because it will take a few years to capture the mundane utility. Well, sure. Costing ~$0 does not make something useful right away. Still takes time. It’s a hell of a good start, though.
If you want to be an AI skeptic, I suppose this would be the week for it, given what’s happening in the stock market?
They wouldn’t be poor in that scenario, they would be less rich than others while being richer than we are today, the same as we today compare to the past. Assuming, that is, that ‘economic normal’ still somehow held and having shares in the companies still meant something.
The Quest for Sane Regulations
A reader emailed me to say that about 30 AI bills passed one chamber in California the week of May 24 because they wanted to dodge the deadline to keep bills alive, and was happy to punt all responsibility to the other chamber (and Newsom). So mystery solved, although not in a good way. Wow.
SB 1047 is soon going to get modified by one final committee, after which we will know to what extent Anthropic’s proposals were incorporated and other changes made. I’ve been informed that there is still the possibility of changes on the floor later on, so the saga will still not quite be locked into place.:
Someone is buying anti-SB 1047 ads on Twitter that are 100% false, including the utterly bonkers claim that Google and Meta are backing the bill. Both are opposed.
Future of Life Institute calls out Anthropic for trying to significantly weaken SB 1047.
I was sad to see that Fei-Fei Li is presenting a severely misunderstood version of the contents of SB 1047. Her editorial in Fortune warns that SB 1047 would hurt open source and academia and ‘little tech,’ but none of that is accurate, and in particular all the mechanisms she cites are misinformation, repeating a16z talking points. This includes the standard objection that open models are incompatible with the shutdown clause. Whereas the shutdown clause has a very explicit exception to ensure this is not true.
I would like to chalk this up to honest misunderstanding, but I also note her startup was recently rather heavily funded by a16z. Is a16z conditioning its investments on such advocacy, implicitly or explicitly? Say it isn’t so.
This Time article discusses a letter by academics (‘renowned experts’) in support of SB 1047, pointing out among other things that the bill’s provisions flat out do not apply to academics, and that concerns about the ‘kill switch’ impacting open models are misplaced because it does not apply to copies not under your control.
That’s Not a Good Idea
It’s very short so for kicks I read the bill. The wording here is rather ludicrously broad. The key provision is the definition of deceptive AI-generated Audio or Visual Media, distribution of which is mostly prohibited when in the context of a Federal election. Especially look at (B)(i) here:
As I understand it, this means that if AI-generated material would change someone’s impression or understanding, even if they are not deceived, then that would count here? Banning that does seem rather unwise and also unconstitutional to me.
And yeah, the idea that Hawley (or any other Senator or lawmaker) would never write an unconstitutional bill? I would say ‘Lol, lmao even’ so Ari is very on point.
Here is the latest warning that the EU AI Act might have ‘gone too far,’ or the bar ‘set too high.’ I got reasonably far into analyzing the EU AI Act before another big project took priority, and hope to return to that in September. There certainly are a lot of complicated rules and requirements there, and a lot of them to me make little sense. The EU is, as usual, mostly worried about all the wrong things. This is what happens when you try to mostly address particular use cases rather than the models, and yes you will eventually have to do at least one of those. There are also the distinct rules for the biggest models, but the ones I most worry will actually be painful there are aimed at things like rights to training data.
The Week in Audio
You can now listen to my posts in podcast form via Apple Podcasts, Spotify, Pocket Casts, and RSS.
Bryan Caplan on AI.
Nathan Young and Robin Hanson discuss AI risk.
Joscha Bach says AI ‘may self-improve to such an extent that it can virtualize itself on to every substrate in nature, including our own nervous systems and brains, and integrate us into a planetary-scale intelligence.’
Except, if it self-improved that much, what good are we doing in that intelligence?
Nick Bostrom warns long term investments like college or PhDs might not be worth it because AI timelines are too short (among doubtless many other claims). If you are looking to impact AI outcomes and stop us all from dying or avoid dystopia, then I see this possibility as a strong argument.
However, if not, then I think under uncertainty you need to be prepared for more normal scenarios. I stand by my Advice for the Worried, and most of it also applies to the Excited and Unworried. You do want to position yourself to capture upside, but if you are confident that transformational AI is only a few years away, either you know things I very much do not know, or that confidence is unearned.
Elon Musk talked to Lex Fridman for 8 hours about various topics. Musk gets that AI existential risk is a big deal, but he’s still going on about ‘adheres to truth’ as a solution. I am a big truth fan but I do not understand in this context why that would be sufficient? A hint is that the sentence ends ‘whether that truth is politically correct or not.’ Which indicates that Musk does not actually grasp the problem. I also increasingly notice that for someone who says they care so much about truth, well, whoops.
Exact Words
It is so easy to be fooled into thinking there is a commitment where there is none.
Let the record reflect that OpenAI made no such commitment. As noted last week, what Altman actually said was that they were ‘working with the US AI Safety Institute on an agreement.’ That is good news. It is not an agreement. In all such situations, even when everyone is in good faith, nothing is done until everything is done.
So what are the bills that OpenAI is known to be endorsing?
According to Tech Crunch’s Maxwell Zeff, they are the Future of AI Innovation Act (S.4178), the NSF AI Education Act (S.4394) and the Create AI Act (S.2714).
As you would imagine, the Future of AI Innovation Act is primarily about ‘innovation’ and ensuring AI goes faster. Included is the crafting of voluntary safety testing standards, but I don’t see any sign that they could not simply be ignored, or that they will in any way be aimed at preventing critical harms or existential risks, or that anyone involved isn’t primarily focused on ‘promoting innovation.’
I can’t do a full RTFB on every bill, but a quick skim of S.4178 says Title I section A centrally empowers AISI to create voluntary standards, metrics and evaluations, which if inconvenient can be ignored. Section B is about suggesting the idea of international cooperation, focusing on unification of standards including safety standards, but treating this no different than any other technology. Section C asks to ‘identify barriers to innovation.’
Then Title II actively does things to accelerate AI development via data sets and grand challenges. One list includes generic ‘safety’ as one of many topics. That’s it.
I would have to do more work before being confident, both careful reading and analysis of implications, but overall my guess is this is a net positive bill. Authorizing and funding the US AISI is of vital importance.
If we create a good voluntary framework, at minimum that gives us more visibility, and allows us to build up tools and expertise. That’s a big deal versus not having it. OpenAI is saying they will voluntarily cooperate. So if a company making a plausibly frontier model (such as Meta) refuses to use the framework or grant advance access, that would at minimum be a strong signal to the public and to the government that they face a hostile force. And either in response to that or otherwise, we could make the system less voluntary in the future, once it was better established and it was clear that this was necessary.
Could the bill be a lot better? Oh, definitely. This still seems clearly far better than nothing, and given the political situation I sure expected nothing in 2024. The worry is this might preclude something better, but I don’t think that dominates here.
So it is good that OpenAI supports this bill. It is also an easy bill to support.
The second bill, S. 4394, is ‘to support NSF education and professional development relating to artificial intelligence.’ They intend to do that by giving money for various AI education initiatives, including guidance for introducing it into K-12 education, and a section to emphasize it in community college and vocational schools.
It is even easier to see why OpenAI would support that one. Who wouldn’t want government money to train workers to use their products?
The third bill, S.2714, establishes the National AI research Resource. This is clearly meant primarily as a way to once again ‘spur innovation’ and ‘improve access’ on AI, although it does aim to (as a 4th goal of 4) support ‘testing, benchmarking and evaluation of AI systems developed and deployed in America.’
This seems straightforwardly to be Free Government Money for AI, essentially more central planning, except also in favor of More AI. I do not think the public would be pleased, and in general central planning is not needed. With so many billions going into AI, and AI deeply unpopular, why should the government be giving out additional funds, even setting all safety concerns aside?
The answer of course is ‘innovation,’ ‘competitiveness’ and ‘beat China,’ which Congress continues to think are things that happen because it makes them happen.
Tentative evaluation is that the second and third bills are pork barrel spending, with the AI industry being given gifts from the public till. They are not especially harmful to safety or anything, just bad policy. And they are bad in ways that all those I continuously debate on such issues would agree are bad, for any other industry.
I can see a strong case for government action on chips (e.g. a good version of the CHIPS Act), or for promoting the building out of the electricity grid, although for the grid the right move is very clearly ‘get regulatory barriers out of the way’ rather than giving out cash. There are clear reasons why those are coordination problems, with positive externalities and national security concerns. These bills feel different.
I would however happily trade such bills and investments in order to get other things I care about more, especially around AI safety and existential risk, but also ordinary good government policies. This is normal politics. These are normal bills.
Openly Evil AI
Zach Stein-Perlman updates on the extent to which what happened with GPT-4o’s evaluation met the commitments in the OpenAI Preparedness Framework. They did not comply with the commitments, and did not admit they were not doing so. Their choices were in-context reasonable on practical grounds, given this was not a substantial leap over GPT-4, but if you make exceptions to your safety rules you really need to be clear on that.
He also notes that OpenAI has now responded to the letter from several Senators.
So what do they say?
Well, on the first question of the 20% commitment, they flat out repeat Altman’s lie:
No, you did not say that, you lying liars. You committed the 20% to superalignment, to the safety of future more capable models. You do not get to count efforts to align current models in order to make them more useful, and then say you kept your commitment.
Once again, here is the original commitment (bold mine):
Mostly they talk about the standard things they do and have. They talk about the preparedness framework (which they didn’t hold to on its only test so far) and deployment safety board (with no reason for us to think it has any teeth). They cite the ‘voluntary commitments’ they made, without evidence they have kept them. They cite having endorsed S. 4178, see that section.
They do make the substantial point that they have held back their voice engine, and also Sora their video engine, at least partly out of safety concerns. These are real actions to their credit, but also are mundane safety concerns and there were real practical reasons to hold back.
For security (and cybersecurity) they cite bringing in General Nakasone and some research grants and collaborations and their stopping API accounts of some hostile operations.
For employment practices they point out the changes they made, and they do hard commit to not enforce non-disparagement agreements unless they are mutual (which means they could still insist on mutual ones as worded). They confirm they won’t ‘void vested equity’ but don’t affirm here the right to actually sell or profit from that equity. For whistleblowing they cite their handbook, and a new ‘integrity line,’ but don’t address or admit the fact that their previous contracts and conduct were very clearly in violation of the law, or explain why anyone should believe they are safe to speak.
And that’s pretty much it.
When someone gives you a list of questions, you should reply with a list of answers, not a list of general statements that purport to include the answers. If we translate to actual answers here, what do we get?
The senator’s questions, abridged:
So here are the answers:
To be fair to OpenAI, senator letters like this do not have force of law, and they can and do make highly unreasonable requests. You have every right to tell the Senators you are not answering, or even to fully not answer, and take the consequences.
This still seems like a rather impressive amount of not answering and not admitting you’re not answering.
Zack Stein-Perlman’s response points out many of these failures to answer.
Goodbye to OpenAI
The exodus continues.
Remember when everyone bragged that Greg Brockman went right back to coding the minute Sam Altman was reinstated as CEO, contrasting it with Helen Toner deciding to finally get some sleep like a normal human? The man whose Twitter feed is almost entirely about the joy of coding and working hard is finally taking a break.
This may or may not be what Brockman claims it to be. He seems both like a man desperately in need of at least some break, and also someone who would not choose to take one of this length. Especially at OpenAI, people who take breaks like this often do not return.
According to Claude, only 30%-50% of those who take such sabbaticals ever return. The rate for cofounders is higher. But when I give it the additional information that another co-founder quit the same day, the percentage drops back down again, finally to 40%-60% with full context. An early Manifold market is trading at 53% for his return. That seems reasonable, if I had to trade I would sell.
The full context is that OpenAI lost two other leaders as well.
Another cofounder, John Schulman, who has been in charge of model safety and was put in charge of what is left of superalignment-style efforts after Ilya Sutskever and Jan Leike left, has himself left for Anthropic.
Here is his gracious statement, on Twitter:
You know Altman’s reply was Super Serious because it used proper capitalization. One day Roon will use proper capitalization and I will totally freak out.
I became a John Schulman fan after listening to him talk to Dwarkesh Patel, and a bigger one after we talked and he solicited my feedback on various mattered. Given that OpenAI no longer had Ilya Sutskever and Jan Leike, I thought Schulman was a very good pick to continue the effort.
Was this the right thing for John Schulman to do? We can’t know. I don’t know anything that I haven’t talked about that would inform the answer to that.
I also notice the cover story seems implausible. Being the head of safety efforts at OpenAI uniquely positioned Schulman to make a difference there. As a cofounder and trusted team member, he could muster more support for such efforts, and provide a counterweight. If there was ever anyone who had a good argument that they should be working for OpenAI in spite of it all, he would be it.
If he wanted to do hands-on technical work, would OpenAI have prevented this? Not given him the support he needed? If so that’s a major blackpill, even given everything.
The other context is that they simultaneously also lost Peter Deng. The Information says all three departures are unrelated. Perhaps this is true. Perhaps not. The departures continue to add up.
It is times like this that it is
Rhetorical Innovation
Oh, yeah, when was that Asilomar conference where Demis Hassabis, Ilya Sutskever, Yann LeCun, Elon Musk and others got together to say ‘Teams developing AI systems should actively cooperate to avoid corner-cutting on safety standards’? Right before the founding of OpenAI? That makes sense.
The case for opening up other frontier technologies as metaphor, attempt # a lot.
Wow, The Guardian will print anything, won’t they? This time it’s Gary Marcus attacking Sam Altman for pretending to favor regulation and safety in front of Congress and then reversing course and having OpenAI push against meaningful regulation instead (okay, sure, that happened, fair enough, indeed he is pushing for Free Money), various other cases of Altman playing obviously evil we’ve covered here (which happened too). Then he talks both about how Altman is running a hype machine and also how he is ‘incredibly powerful.’ Several people pointed out the obvious contradictions. Given he is predicting a ‘collapse of the AI bubble’ in ‘weeks or months’ (which I very much don’t except, but he says he does) why not simply wait?
A rhetorical innovation rambling attempt to ask if this is anything: A common political debate is how much to worry about different kinds of inequalities, versus rising tides lifting all boats (or if you’re against that, ‘trickle-down economics.’) How much should we worry about inequalities of wealth? Consumption? Power?
Where the best reason to worry about it, I think, is if those differences lead to self-perpetuating differences in power.
That’s the key. You can not only absolutely fix it after the fact if allowed to, there is not even anything to fix. Other people benefit more, you still benefit, it’s fine so long as you also do better. But if this means you are permanently disempowered, then it’s not fine.
This then translates into three ways of thinking about downside versus upside on AI.
Extrapolating from past technologies says rev the engine, capture the upside. We will all be vastly wealthier, we can ‘fix it in post’ and even if we don’t so what, if you get a galaxy and I have a planet then I still have a planet where I used to have an apartment. Presumably I will be fine.
However, there are two obvious ways This Time Could Be Different.
Both are essentially human disempowerment stories. Until now, the greatest power in the ‘verse has always been the people. The ability to say no, and take the consequences, and for us to produce and fight and think and do. Even those who don’t have valuable skills are mostly still able to produce far more value than the cost of living. And because each of us is fundamentally limited to being one person who can do one thing at a time, it is hard for imbalances to scale and remain stable, in important ways we are all similar or equal and coordination is hard.
In a future where AI is smarter and more capable than us, whether or not it remains a ‘mere tool’ and firmly under our guidance, that likely is all no longer the case.
First is the mundane worry. That AI could enable permanent dictatorship or otherwise lock in rules and power structures among humans, especially if it also leaves many or most humans without sufficient economic or military value to justify their existence to such a regime. That the Wrong Person or Group with the Bad Values could end up in charge.
A real worry, to be sure. Many are exclusively or mostly worried about this. I’m not, party because I worry about the other failure case more (where they often dismiss it out of hand), and partly because humans at the end of the day tend to be fundamentally ‘normative’ in the sense that we prefer good things to bad things, life to death, health to sickness, wealth to poverty, happiness to sadness and so on.
And I think the primary reason people in charge do so much damage to those not in charge is because the people in charge think they need to do that to stay in charge (sometimes they’re right, other times not), with the secondary reason being limited resources, and the third being that central planning doesn’t work. If we have abundance, don’t need to worry about failed central planning, and the regime is no longer threatened, which we should expect at this limit, I like our odds a lot more. One should of course still have preferences within this range of outcomes.
The other worry is that there will be inequality between humans and AIs. And in particular, if AIs are sufficiently smarter and more capable and more competitive, where does that leave us? Won’t the AIs end up with all the resources and the power? Don’t they export to us while not importing back? Won’t those who let the AIs be in charge ‘win’ (or their AIs win) where the rest of us lose? Won’t the humans and their values be outcompeted and lack the ability to survive, even if nothing is actively hostile to them? None of this requires a ‘foom’ or a singleton or a ‘rogue AI’ or a war against humanity or a treacherous (sharp left or otherwise) turn. Those could happen, but the baseline is far simpler.
Isn’t that the baseline scenario when there exist things that are smarter and more capable and more competitive and efficient than you, however they came to be?
If you believe in freedom and competition in such a scenario, why wouldn’t we lose? What is going to make us not lose? I have yet to see a plausible non-absurd answer, other than ‘the AI won’t be sufficiently capable to trigger all that,’ or the people saying ‘but the AIs replacing us is good.’
Getting back to the original quote, the political economy of ‘growth + taxes + welfare state’ essentially works among humans, given sufficiently friendly conditions. But there are many reasons that this has held up, that seem unlikely to hold in a world with ASIs.
Open Weights Are Unsafe and Nothing Can Fix This
What if something could fix this?
TARs or other defenses do not need to be fully foolproof to work. They merely have to raise the cost of an attack sufficiently that it is not worth bothering, and you’d either give up or would rather use other methods or roll your own model.
It appears this result shows you can at least make things harder on the attacker, forcing them to use either improved or different techniques, or a lot more compute. That is an excellent start.
Those who see large value in open models should be doing a lot of this kind of research. Indeed, the open source community should be excellent at solving this exact type of problem, if technical solutions exist to be found. This is exactly where openness does improve security, with everyone looking for vulnerabilities, fixing issues, iterating until things improve.
Would having a robust defense against fine tuning, such that safety training could be effectively made similarly effective to how it works in closed models, be sufficient to address all the concerns about open models? No, not all the issues take this form. But it would be a hell of a start.
Aligning a Smarter Than Human Intelligence is Difficult
John Pressman reminds us that college students, on the order of half of them, would increasingly plug themselves into Nozick’s experience machine. Nozick’s thought experiment that was supposed to illustrate that a purely illusory existence does not actually satisfy our preferences, that we care about actual things. Except now many reject that idea, and would happily ‘plug into the Matrix’ even without any attempt to sell them on it.
What do we do, if the people would not choose The Good, and instead pick a universe with no value? Should we accept it? Does the answer change if their choice is the result of a marketing campaign, perhaps by AIs?
MR links to the latest proposal to give AIs ‘legal freedoms,’ and then count on the game-theoretic equilibrium to protect us, and enjoy ‘gains from trade.’ This is known as failing to correctly solve for the equilibrium. If indeed there was need to do this ‘to keep us safe’ then it definitely won’t work.
People Are Worried About AI Killing Everyone
Janus warns that if you are an AI alignment researcher or sympathizer, you should be very alarmed by the ease with which Llama-3.1-405B can simulate Sydney, with zero examples given. Unless you saw this coming, of course.
Learning how to summon demons is vital to future safe summonings. However, given what this shows, one might worry Janus is violating the most important law of summoning: Do not call up that which you cannot put down.
(To be clear, no I am not at all afraid of this specific thing, but the principle is crucial. But also, as Kevin Roose put it, perhaps let’s avoid this sort of thing.)
I would say that while I did not actively see this specific thing coming, I don’t see why we wouldn’t expect such things out of a base model, nor does it seem surprising. Indeed, if this was actively surprising that seems like a flaw in your model?
So on net I consider this case Good News, Everyone. A mild update that this kind of thing is easier to pull off, but its value as a salient warning should outweigh that.
Other People Are Not As Worried About AI Killing Everyone
In extremely related this-doesn’t-imply-what-you-think-if-true news, Marc Andreessen and Daniel implicitly endorse idea that AI cannot be controlled.
Similarly, here’s Marc Andreessen celebrating the calling up of that which cannot be put down and cannot be controlled, exactly because it cannot be put down or controlled. This isn’t missing mood as much as reversed mood that reveals endorsement of the bad or anti-human exactly because it is bad or anti-human. A lot of that going around.
David Brooks says many people, especially in the humanities and liberal arts, fear AI, but they should not do so. Brooks is unworried because he expects AI capabilities to not advance much, for it to remain ‘simply a useful tool.’
If that happens, then I too am an optimist. We can do a lot of good with a useful tool.
However his core reason for why is essentially ‘humans are special,’ that it is not ‘all just information processing’ in the human brain. The AI won’t be able to match (his terms) our physical navigation of nature, our pursuit of goodness, or our ‘love,’ it ;acks consciousness, awareness, emotions, moral sentiments, agency, a unique worldview.
I am rather confident that this whole line of argument is copium. AI might stall out for a bit or prove more difficult than we expect, but if so it is not going to be because humans are magic. He believes that we are special. That somehow the rules do not apply to us. Obviously, he is mistaken.
It is always refreshing to hear someone say explicitly that they are optimistic about impact and the need to worry exactly because they are pessimistic that AI will be transformational, rather than having to pull teeth to establish this.
And I am especially grateful that David Brooks (implicitly, but clearly) agrees that if the evangelists were right, then you should worry.
The Lighter Side
Safety first.
It’s a drawing hand problem.