If we do get powerful AI, it seems highly plausible that even if we stay in control we will 'go too fast' in deploying it relative to society's ability to adapt, if only because of the need to grow fast and stay ahead of others, and because the market doesn't care that society wants it to go slower.
After reading my interpretation was this: assuming we stay in control, that happens only if powerful AI is aligned. The market doesn't care that society wants to go slower, but AI will care that society wants to go slower, so when the market tries to force AI to go faster, AI will refuse.
I reflected on whether I am being too generous, but I don't think I am. Other readings didn't make sense to me, and I am assuming Dario is trying to make sense, while you seem doubtful. That is, I think this is plausibly Dario's actual prediction of how fast things will go, not a hope it won't go faster. But importantly, that is assuming alignment. Since that assumption is already hopeful, it is natural the prediction under that assumption sounds hopeful.
Paul Crowley: It's a strange essay, in that it asks us to imagine a world in which a single datacenter contains 1E6 Nobelists expert in every field and thinking at 100x speed, and asks what happens if "sci-fi" outcomes somehow don’t happen. Of course "sci-fi" stuff happens almost immediately.
I mean, yes, sci-fi style stuff does seem rather obviously like it would happen? If it didn't, then that’s a rather chilling indictment of the field of sci-fi?
To re-state, sci-fi outcomes don't happen because AI is aligned. Proof: if sci-fi outcomes happened, AI would be unaligned. I actually think this point is extremely clear in the essay. It literally states: "An aligned AI would not want to do these things (and if we have an unaligned AI, we're back to talking about risks)".
I see how you got there. It's a position one could take, although I think it's unlikely and also that it's unlikely that's what Dario meant. If you are right about what he meant, I think it would be great for Dario to be a ton more explicit about it (and for someone to pass that message along to him). Esotericism doesn't work so well here!
I propose a new term, gas bubble, to describe the spate of scams we're about to see. It's a combination of gas lighting and filter bubble.
I have a post coming soon regarding places to donate if you want to support AI existential risk mitigation or a few other similar worthy causes
We have a list here - there might be some overlap: https://www.aisafety.com/funders
Are the ‘AI companion’ apps, or robots, coming? I mean, yes, obviously?
The technology for bots who are "better" than humans in some way (constructive, pro-social, compassionate, intelligent, caring interactions while thinking 2 levels meta) has been around since 2022. But the target group wouldn't pay enough for GPT-4-level inference, so current human-like bots are significantly downscaled compared to what technology allows.
Dario Amodei is thinking about the potential. The result is a mostly good essay called Machines of Loving Grace, outlining what can be done with ‘powerful AI’ if we had years of what was otherwise relative normality to exploit it in several key domains, and we avoided negative outcomes and solved the control and alignment problems. As he notes, a lot of pretty great things would then be super doable.
Anthropic also offers us improvements to its Responsible Scaling Policy (RSP, or what SB 1047 called an SSP). Still much left to do, but a clear step forward there.
Daniel Kokotajlo and Dean Ball have teamed up on an op-ed for Time on the need for greater regulatory transparency. It’s very good.
Also, it’s worth checking out the Truth Terminal saga. It’s not as scary as it might look at first glance, but it is definitely super far out.
Table of Contents
Language Models Offer Mundane Utility
Just Think of the Potential, local edition, and at least I’m trying:
Perplexity CEO pitches his product:
I do not think Srinivas appreciates the point of a Bloomberg terminal.
The point of the Bloomberg terminal is that it was precise, reliable, up to the second data, and commands reliably do exactly what you want, and it has exactly the features traders want and count on to let them figure out the things they actually care about to make money, with shortcuts and other things to match their needs in real time. Perplexity Pro is probably worth $20/month to a lot of people but I am confident Bloomberg is unworried.
Dean Ball is impressed with o1 for tasks like legal and policy questions, and suggests instructing it to ask follow-up and clarifying questions. I haven’t been as impressed, I presume largely because my purposes are not a good fit for o1’s strengths.
Avital Balwit on how they use Claude especially for writing and editing tasks, also language learning, calorie counting and medical diagnoses. Here are some tips offered:
‘Most people are underutilizing models,’ the last section heading, is strongly true even for those (like myself) that are highly aware of the models. It is a weird kind of laziness, where it’s tempting not to bother to improve work flow, and it seems ‘easier’ in a sense to do everything yourself or the old way until you’ve established the new way.
Jacquesthibs details all the AI productivity software they’re using, and how much they are paying for it, which Tyler Cowen found hilarious. I understand his reaction, this seems a lot like a cumulation of getting milked for $10 or $20 a month for versions of the same thing, often multiple copies of them. But that’s because all of this is dramatically underpriced, and having the right tool for the right job is worth orders of magnitude more. The question here is correctly ‘why can’t I pay more to get more?’ not ‘why do I need to pay so many different people’ or ‘how do I avoid buying a service that isn’t worthwhile or is duplicative.’ Buying too many is only a small mistake.
Analyze your disagreements so you win arguments with your boyfriend, including quoting ChatGPT as a de facto authority figure.
Language Models Don’t Offer Mundane Utility
The boyfriend from the previous section is not thrilled by the pattern of behavior and has asked his girlfriend to stop. The alternative option is to ‘fight fire with fire’ and load up his own LLM, so both of them can prompt and get their version to agree with them and yell AI-generated arguments and authority at each other. The future is coming.
And by language models, we might mean you.
This is not always the mode I am in, but it is definitely sometimes the mode I am in. If you think you never do this, keep an eye out for it for a while, and see.
What do we call ‘the thing LLMs can’t do that lets us dismiss them’ this week?
There is some disputing the question in the comments. I think it mostly confirms Dan’s point.
Alternatively, there’s the classic options.
Checking to see if you’re proving too much is often a wise precaution.
Did they, though?
Saying you’re coming out of that ‘anti-ChatGPT’ is a classic guessing of the teacher’s password. What does it mean to be ‘anti-ChatGPT’ while continuing to use it? We can presumably mostly agree that it would be good for university education if some of the uses of LLMs were unavailable to students – if the LLM essentially did a smart version of ‘is this query going to on net help the student learn?’ That option is not available.
Students mostly realize that if they had to fact check every single statement, in a ‘if there is a factual error in this essay you are expelled’ kind of way, they would have to give up on many use cases for LLMs. But also most of the students would get expelled even without LLMs, because mistakes happen, so we can’t do that.
Classic fun with AI skepticism:
It’s certainly possible they used ChatGPT for this, but they’re definitely fully capable of spouting similar Shibboleth passwords without it.
The thing is, I’d prefer it if they were using ChatGPT here. Why waste their time writing these statements when an AI can do it for you? That’s what it’s for.
Deepfaketown and Botpocalypse Soon
If they get sufficiently difficult to catch, xkcd suggests ‘mission f***ing accomplished,’ and there is certainly something to that. The reply-based tactic makes sense as a cheap and easy way to get attention. Most individual replies could plausibly be human, it is when you see several from the same source that it becomes clear.
If we are relying on humans noticing the bots as our defense, that works if and only if the retaliation means the bots net lose. Yes, if you can figure out how to spend $1000 to make us waste $1mm in time that is annoying, but is anyone going to actually do that if they don’t also make money doing it?
As we’ve discussed before, the long term solution is plausibly some form of a whitelist, or requiring payment or staking of some costly signal or resource as the cost of participation. As long as accounts are a scarce resource, it is fine if it costs a lot more to detect and shut down the bot than it does to run the bot.
Are the ‘AI companion’ apps, or robots, coming? I mean, yes, obviously?
Everyone involved agrees that the AI sex robots, toys and companions will likely replace porn, toys that get used alone and (at least lower end) prostitution. If you’re already in the fantasy business or the physical needs business rather than the human connection and genuine desire business, the new products are far superior.
If you’re in the desire and validation business, it gets less clear. I’ve checked a few different such NSFW sites because sure why not, and confirmed that yes, they’re mostly rather terrible products. You get short replies from dumb models, that get confused very easily. Forget trying to actually have a real conversation. No matter your goal in *ahem* other areas, there’s zero challenge or subtlety, and the whole business model is of course super predatory. Alphazira.com was the least awful in terms of reply quality, Mwah.com (the one with the leak from last week) offers some interesting customization options but at least the trial version was dumb as bricks. If anything it all feels like a step backwards even from AI Dungeon, which caused interesting things to happen sometimes and wasn’t tied to interaction with a fixed character.
I’m curious if anyone does have a half-decent version – or kind of what that would even look like, right now?
It does seem like this could be a way for people to figure out what they actually care about or want, maybe? Or rather, to quickly figure out what they don’t want, and to realize that it would quickly be boring.
One must keep in mind that these pursuits very much trade off against each other. Solo opportunities, most of them not social or sexual, have gotten way better, and this absolutely reduces social demand.
I could be alone for a very long time, without interaction with other humans, so long as I had sufficient supplies, quite happily, if that was a Mr. Beast challenge or something. I mean, sure, I’d get lonely, but think of the cash prizes.
As I’ve said before, my hope is that the AI interactions serve as training grounds. Right now, they absolutely are not doing that, because they are terrible. But I can see the potential there, if they improved.
A distinct issue is what happens if you use someone’s likeness or identity to create a bot, without their permission? The answer is of course ‘nothing, you can do that, unless someone complains to the site owner.’ If someone wants to create one in private, well, tough luck, you don’t get to tell people what not to do with their AIs, any more than you can prevent generation of nude pictures using ‘nudify’ bots on Telegram.
If you want to generate a Zvi Mowshowitz bot? You go right ahead, so long as you make reasonable efforts to have it be accurate regarding my views and not be dumb as bricks. Go nuts. Have a great conversation. Act out your fantasy. Your call.
Also it seems like someone is flooding the ‘popular upcoming’ game section of Steam with AI slop future games? You can’t directly make any money that way, there are plenty of protections against that, but here’s one theory:
This actually makes sense. If you can get people interested with zero signs of any form of quality, you can make something. You can even make it good.
They Took Our Jobs
Pizza Hut solves our job costly signal problem, allowing you to print out your resume onto a pizza box and deliver it with a hot, fresh pizza to your prospective employer. You gotta love this pitch:
Perfection, if you don’t count the quality of the pizza. This is the right size for a costly signal, you buy goodwill for everyone involved, and because it wasn’t requested no one thinks the employer is being unfair by charging you to put in an application. Everybody wins.
Get Involved
UK AISI hiring technical advisor, deadline October 20, move fast.
Tarbell Grants will fund $100k in grants for original reporting on AI.
Introducing
Google’s Imagen 3 now available to all Gemini users.
Grok 2 API, which costs $4.20/million input tokens, $6.9/million output tokens, because of course it does. Max output 32k, 0.58sec latency, 25.3t/s.
Jacob: the speed is incredible and they just added function calling! plus, it’s not censored. Less safeguards = better.
Don’t you love world where what everyone demands are less safeguards? Not that I’d pretend I wouldn’t want the same for anything I’d do at this stage.
OpenAI’s MLE-Bench is a new benchmark for machine learning engineering, paper here, using Kaggle as a baseline. o1-preview is starting out getting to bronze medal level in 16.9% of competitions. Predictors expect rapid improvement, saying there is a 42% chance the 80% threshold is reached by the end of 2025, and 70% by end of 2026.
In Other AI News
As he likes to say, a very good sentence:
From Scott Alexander, an AI Art Turing Test.
Google to build small modular nuclear reactors (SMRs) with Kairos Power, aiming to have the first online by 2030. That is great and is fast by nuclear power standards, and also slower than many people’s timelines for AGI.
Amazon is getting in on the act as well, will invest over $500 million over three projects.
As Ryan McEntush points out, investing in fully new reactors has a much bigger impact on jumpstarting nuclear power than investments to restart existing plants or merely purchase power.
Also it seems Sierra Club is reversing their anti-nuclear stance? You love to see it.
Eric Schmidt here points out that if AI drives sufficient energy development, it could end up net improving our energy situation. We could move quickly down the cost curve, and enable rapid deployment. In theory yes, but I don’t think the timelines work for that?
The full release of Apple Intelligence is facing delays, it won’t get here until 5 days after the new AppInt-enabled iPads. I’ve been happy with my Pixel 9 Fold purely as a ‘normal’ phone, but I’ve been disappointed by both the unfolding option, which is cute but ends up not being used much, and by the AI features, which I still haven’t gotten use out of after over a month. For now Apple Intelligence seems a lot more interesting and I’m eager to check it out. I’m thinking an iPad Air would be the right test?
Nvidia releases new Llama 3.1-70B fine tune. They claim it is third on this leaderboard I hadn’t seen before. I am not buying it, based on the rest of the scores and also that this is a 70b model. Pliny jailbroke it, of course, ho hum.
If you’ve ever wanted to try the Infinite Backrooms, a replication is available.
Dane, formerly CISO of Palantir, joins OpenAI as CISO (chief information security officer) alongside head of security Matt Knight.
The Audacious Project lives up to its name, giving $21 million to RAND and $17 million to METR.
I have a post coming soon regarding places to donate if you want to support AI existential risk mitigation or a few other similar worthy causes (which will not a remotely complete list of either worthy causes or worthy orgs working on the listed causes!).
A common theme is that organizations are growing far beyond the traditional existential risk charitable ecosystem’s ability to fund. We will need traditional other foundations and wealthy individuals, and other sources, to step up.
Unfortunately for AI discourse, Daron Acemoglu has now been awarded a Nobel Prize in Economics, so the next time his absurdly awful AI takes say that what has already happened will never happen, people will say ‘Nobel prize winning.’ The actual award is for ‘work on institutions, prosperity and economic growth’ which might be worthy but makes his inability to notice AI-fueled prosperity and economic growth worse.
Truth Terminal High Weirdness
The Truth Terminal story is definitely High Weirdness.
AI Notkilleveryoneism memes found the story this week.
As I understand it, here’s centrally what happened.
Nothing in this story (except Andy Ayrey) involves all that much… intelligence.
As I understand it this is common crypto behavior. There is a constant attention war, so if you have leverage over the attention of crypto traders, you start getting bribed in order to get your attention. Indeed, a key reason to be in crypto Twitter at all, at this point, is the potential to better monetize your ability to direct attention, including your own.
Deepfates offers broader context on the tale. It seems there are now swarms of repligate-powered crypto-bots, responding dynamically to each post, spawning and pumping memecoin after memecoin on anything related to anything, and ToT naturally got their attention and the rest is commentary.
As long as they’re not bothering anyone who did not opt into all these zero sum attention games, that all seems like harmless fun. If you buy these bottom of the barrel meme coins, I wish you luck but I have no sympathy when your money gone. When they bother the rest of us with floods of messages – as they’re now bothering Deepfates due to one of ToT’s joke tweets – that’s unfortunate. For now that’s mostly contained and Deepfates doesn’t seem to mind all that much. I wonder how long it will stay contained.
Janus has some thoughts about how exactly all this happened, and offers takeaways, explaining this is all ultimately about Opus being far out, man.
But what directly caused ToT to happen?
One lesson here is that, while you don’t want ToT spouting nonsense or going too far too fast, ToT being misaligned was not a bug. It was a feature. If it was aligned, none of this would be funny, so it wouldn’t have worked.
I agree with Janus that the crypto part of the story is ultimately not interesting. I do not share the enthusiasm for the backrooms and memes and actualizations, but it’s certainly High Weirdness that I would not have predicted and that could be a sign of things to come that is worthy of at least some attention.
Quiet Speculations
A very important claim, huge if true:
There will sometimes be some gap, and I don’t know what I don’t know. The biggest known unknown is the full o1. But in this competitive situations, I find it hard to believe that a worthy version of GPT-4.5-or-5 or Claude Opus 3.5 is being held under wraps other than for a short fine tuning and mitigation period.
What does seem likely is that the major labs know more about how to get the most out of the models than they are letting on. So they are ‘living in the future’ in that sense. They would almost have to be.
If AGI does arrive, it will change everything.
Many who believe in AGI soon, or say they believe in AGI soon, compartmentalize it. They still envision and talk about the future without AGI.
I think there’s also a lot of doublethink going on here. There’s the future non-AGI world, which looks ‘normal.’ Then there’s the future AGI world, which should not look at all normal for long, and never the twain shall meet.
On top of that, many who think about AGI, including for example Sam Altman, talk about the AGI world as if it has some particular cool new things in it, but is still essentially the same. That is not how this is going to go. It could be an amazingly great world, or we could all die, or it could be something unexpected where it’s difficult to decide what to think. What it won’t be is ‘the same with some extra cool toys and cyberpunk themes.’
The default way most people imagine the future is – literally – that they presume that whatever AI can currently do, plus some amount of people exploiting and applying what we have in new directions, is all it will ever be able to do. But mostly they don’t even factor in what things like good prompt engineering can already do.
Then, each time AI improves, they adjust for the new thing, and repeat.
Similarly, ‘you predicted that future advances in AI might kill everyone, but since then we’ve had some advances and we’re all still alive and not that much has changed, therefore AI is safe and won’t change much of anything.’
And yes, versions of this argument that are only slightly less stupid are remarkably central, this is the strawman version made real but only by a small amount:
An interesting prediction:
Gary Marcus says ‘rocket science is easier than AGI’ and I mean of course it is. One missing reason is that if you solved AGI, you would also solve rocket science.
Steve Newman analyzes at length how o1 and Alpha Proof solve problems other LLMs cannot and speculates on where things might go from here, calling it the ‘path to AI creativity.’ I continue to be unsure about that, and seem to in many ways get more confused on what creativity is over time rather than less. Where I do feel less confused is my increasing confidence that creativity and intelligence (‘raw G’) are substantially distinct. You can teach a person to be creative, and many other things, but you can’t fix stupid.
Llama 3 said to be doing the good work of discouraging what was previously a wave of new frontier model companies, given the need to beat the (not strictly free, but for many purposes mostly free) competition.
Most of that unless should have applied regardless of Llama 3 or even all of open weights. The point of a new foundation model company is to aim high. If you build something world changing, if you can play with the top labs, the potential value is high enough to justify huge capital raises. If you can’t, forget it. Still, this makes it that much harder. I’m very down with that. We have enough foundation model labs.
What is valuable is getting into position to produce worthwhile foundation models. The models themselves don’t hold value for long, and are competing against people establishing market share. So yeah.
There’s also this:
They made a lot more advanced AI chips, and some of the low hanging fruit got picked, so the market price declined?
Meet the new prompt, same as the old prompt, I say.
Oh, you’ll still do prompt engineering. Even if you don’t write the prompts from scratch, you’ll write the prompts that prompt the prompts. There will be essentially the same skill in that.
Not where I’d have expected minds to be changed, but interesting:
The reasoning here makes sense. If there are low hanging algorithmic improvements that provide big upgrades, then a cascade of such discoveries could happen very quickly. Discovering we missed low-hanging fruit suggests there is more out there to be found.
Copyright Confrontation
New York Times sends cease-and-desist letter to Perplexity, related to Perplexity summarizing paywalled NYT posts without compensation. The case against Perplexity seems to me to be stronger than it does against OpenAI.
AI and the 2024 Presidential Election
As I’ve said elsewhere, I have zero interest in telling you how to vote. I will not be saying who I am voting for, and I will not be endorsing a candidate.
This includes which candidate would be better on AI. That depends on what you think the correct policy would be on AI.
Here are the top 5 things to consider:
The Quest for Sane Regulations
Daniel Kokotajlo and Dean Ball team up for an op-ed in Time on four ways to advance transparency in frontier AI development.
We can disagree about what we want to mandate until such time as we know what the hell is going on, and indeed Dean and Daniel strongly disagree about that. The common ground we should all be able to agree upon is that, either way, we do need to know what the hell is going on. We can’t continue to fly blind.
The question is how best to do that. They have four suggestions.
This seems like a clear case of the least you can do. This is information the government and public need to know. If some of it becomes information that is dangerous for the public to know, then the government most definitely needs to know. If the public knows your safety case, goals, specifications, capabilities and risks, then we can have the discussion about whether to do anything further.
I believe we need to then pair that with some method of intervention, if we conclude that what is disclosed is unacceptable or promises are not followed or someone acts with negligence, and methods to verify that we are being given straight and complete answers. But yes, the transparency is where the most important action is for now.
In conclusion, this was an excellent post.
So I wouldn’t normally check in with Marc Andreessen because as I said recently what would even be the point, but he actually retweeted me on this one, so for the record he gave us an even clearer statement about who he is and how he reacts to things:
Um, sir, this is a Wendy’s? Argumento ad absurdum for the win?
This was co-authored by Dean Ball, who spent the last year largely fighting SB 1047.
This is literally a proposal to ask frontier AI companies to be transparent combined with whistleblower protections? A literal ‘at least we who disagree on everything can agree on this’? That even says ‘these commitments can be voluntary’ and doesn’t even fully call for any actual government action?
So his complaint, in response to a proposal for transparency and whistleblower protections for the biggest companies and literally nothing else, perhaps so someone might in some way hold them accountable, is that people who support such proposals want to ‘centralize AI into a handful of opaque, black box, oligopolistic, unaccountable big companies.’
He seems to be a rock with ‘any action to mitigate risks is tyranny’ written on it.
Stop trying to negotiate with this attitude. There’s nothing to discuss.
Mark Ruffalo and Jason Gordon-Levitt publish an op-ed in Time criticizing Newsom’s veto of SB 1047. Solid, but mostly interesting (given all the times we’ve said the things before) in that they clearly did their homework and understand the issues. They do not think this is about deepfakes. And their willingness to make the straightforward case for the veto as corrupt corporate dodging of responsibility.
Chris Painter of METR proposes we rely on ‘if-then policies,’ as in ‘if we see capability X then we do mitigation Y.’
The Week in Audio
It is amazing how people so smart and talented can come away with such impressions.
Also Matt Stone is missing a lot here.
In unrelated news this week, here’s a Sam Altman fireside chat at Harvard Business School (and here he is talking with Michigan Engineering). From this summary comment it seems like it’s more of his usual. He notes we will be the last generation that does not expect everything around them to be smarter than they are, which one might say suggests we will be the last generation, and then talks about the biggest problem being society adapting to the pace of change. He is determined not to take the full implications seriously, at the same time he is (genuinely!) warning people to take lesser but still epic implications seriously.
The vision is the AI sees everything you do on your computer, has a ‘personality’ he is working on, and so on. Similarly to Tyler Cowen’s earlier comment, I notice I don’t trust you if you don’t both see the potential benefits and understand why that is an episode of Black Mirror.
I do not want a ‘relationship’ with an AI ‘companion’ that sees everything I do on my computer. Thanks, but no thanks. Alas, if that’s the only modality available that does the things I might have little choice. You have to take it over nothing.
Nick Land predicts nothing human will make it out of the near future, and anyone thinking otherwise is deluding themselves. I would say that anyone who expects otherwise to happen ‘by default’ in an AGI-infused world is deluding themselves. If one fully bought Land’s argument, then the only sane response according to most people’s values including my own would be to stop the future before it happens.
Yann LeCun says it will be ‘years if not a decade’ before systems can reason, plan and understand the world. That is supposed to be some sort of slow skeptical take. Wow are people’s timelines shorter now.
AI audio about AI audio news, NotebookLM podcasts as personalized content generation, which is distinct from actual podcasts. I certainly agree they are distinct magisteria. To the extent the AI podcasts are useful or good, it’s a different product.
Just Think of the Potential
Anthropic CEO Dario Amodei has written an essay called Machines of Loving Grace, describing the upside of powerful AI, a term he defines and prefers to AGI.
Overall I liked the essay a lot. It is thoughtful in its details throughout. It is important to keep upside potential in mind, as there is a ton of it even for the minimum form of powerful AI.
In this section I cover my reading and reactions, written prior to hearing the reactions of others. In the next section I highlight the reactions of a few others, most of which I did anticipate – this is not our first time discussing most of this.
Dario very much appreciates, and reiterates, that there are big downsides and risks to powerful AI, but this essay focuses on highlighting particular upsides. To that extent, he ‘assumes a can opener’ in the form of aligned AI such that it is doing the things we want rather than the things we don’t want, as in this note on limitations:
I’m all for thought experiments, and for noticing upside, as long as one keeps track of what is happening. This is a pure Think of the Potential essay, and indeed the potential is quite remarkable. The point of the essay is to quantify and estimate that potential.
The essay also intentionally does not ask questions about overall transformation, or whether the resulting worlds are in an equilibrium, or anything like that. It assumes the background situation remains stable, in all senses. This is purely the limited scope upside case, in five particular areas.
That’s a great exercise to do, but it is easy to come away with the impression that this is a baseline scenario of sorts. It isn’t. By default alignment and control won’t be solved, and I worry this essay conflates different mutually exclusive potential solutions to those problems.
It also is not the default that we will enjoy 5+ years of ‘powerful AI’ while the world remains ‘economic normal’ and AI capabilities stay in that range. That would be very surprising to me.
So as you process the essay, keep those caveats in mind.
I think this is spot on. There are physical tasks that are part of the loop, and this will act as a limiting factor on speed, but there is no reason we cannot hook the AIs up to such tasks.
I am more optimistic here, if I’m pondering the same scenario Dario is pondering. I think if you are smart enough and you don’t have to protect the integrity of the process at every step the way we do now, and can find ways around various ethical and regulatory restrictions by developing alternative experiments that don’t trigger them, and you use parallelism, and you are efficient enough you can give some efficiency back in other places for speed, and you are as rich and interested in these results as the society in question is going to be, you really can go extremely fast.
Dario’s prediction is still quite ambitious enough:
Which means, within 5-10 years, things like: Reliable prevention and treatment of all natural diseases, eliminating most cancer, cures for genetic disease, prevention of Alzheimer’s, improved treatments for essentially everything, ‘biological freedom’ for things like appearance and weight.
Also the thing more important than everything else on the list combined: Doubling of the human lifespan.
As he notes, if we do get powerful AI and things generally go well, there is every reason to expect us to hit Escape Velocity. Every year that goes by, you age one year, but you get more than one year of additional expected lifespan.
Then, you probably live for a very, very long time if you all four of:
If our joint distributional decisions are less generous, you’ll also need the resources.
Dario correctly notes you also avoid all issues of the form ‘how do we pay for medicare and social security.’ Often people imagine ‘you keep getting older at the same rate but at the end you don’t drop dead.’ That’s not how this is going to go. People will, in these scenarios, be staying physically and mentally young indefinitely. There likely will be a distributional question of how to support all the humans indefinitely despite their lack of productivity, including ensuring humans in general have enough of the resources. What there absolutely won’t be is a lack of real resources, or a lack of wealth, to make that happen, until and unless we have at least hundreds of billions or trillions of people on the planet.
Most science fiction stories don’t include such developments for similar reasons to why they ignore powerful AI: Because you can tell better and more relatable stories if you decide such advancements don’t happen.
Dario’s insight here is that brains are neural networks, so not only can AI help a lot with designing experiments, it can also run them, and the very fact that AIs work so well should be helping us understand the human mind and how to protect, improve and make the most of it. That starts with solving pretty much every mental illness and other deficiencies, but the real value is in improving the human baseline experience.
We should have every expectation that the resulting minds of such people, again if the resources of the Sol system are harnessed with our goals in mind, will be far smarter, wiser happier, healthier and so on. We won’t be able to catch up to the AIs, but it will be vast upgrade. And remember, those people might well include you and me.
That does not solve the problems that come with the powerful AIs being well beyond that point. Most humans still, by default, won’t have anything productive to offer that earns, pays for or justifies their keep, or gives them a sense of purpose and mission. Those are problems our future wiser selves are going to have to solve, in some form.
My answer, before reading his, is that this is simple: There will be vastly more resources than we need to go around. If the collective ‘we’ has control over Sol’s resources, and we don’t give everyone access to all this, it will be because we choose not to do that. That would be on us. The only other real question is how quickly this becomes the case. How many years to various levels of de facto abundance?
I draw a clear distinction between economic growth and inequality here. Dario is uncertain about both, but economic growth seems assured unless we engage in by far the largest self-sabotage in history. The question is purely one of distribution.
This is where I think the term ‘inequality’ asks the wrong question.
As in two scenarios:
Thus the good news is that there is no need to solve the socialist calculation problem.
If people choose not to adopt improvements, due to skepticism or defiance or stubbornness or religion or any other reason, then (unless they are right) that is unfortunate but it is also their loss. I’m okay with the individual-scale opt-out issue.
I’m not worried about whether regions ‘catch up’ because again it is about absolute conditions, not relative conditions. If entire regions or nations choose to turn away from the AI future or its benefits, then eventually the rest of the world would have to make a choice – a different and less dire choice than if one area was going rogue in building existentially dangerous AI, but a choice nonetheless.
Which leads into the fourth section.
If we want a good future, that is not a thing that happens by accident. We will have to make that future happen, whatever level of ‘fighting’ that involves.
This is however the place were ‘assuming the can opener’ is the strangest. This essay wants to assume the AIs are aligned to us and we remain in control without explaining why and how that occured, and then fight over whether the result is democratic or authoritarian. The thing is: The answer to the why and how of the first question seems intimately tied to what happens with the second one.
Also powerful AI will even in the best of cases challenge so many of the assumptions behind the entire paradigm being used here. Thus the whole discussion here feels bizarre, something between burying the lede and a category error.
The concrete suggestion here is a coalition of Democracies (aka the “good guys” above?) gets control of the AI supply chain, and increasingly isolates and overpowers everyone else, imposing their system of government in exchange for not being so isolated, and for our AI technology and the associated benefits. The first issue with that plan is, of course, how its targets would respond when they learn about the plan.
Dario suggests AI will favor democracy within nations. As I understand his argument, democracy is ‘right’ and benefits people whereas authoritarianism only survives via deception, so truth will favor democracy, and also he predicts the democrats will have control over the AI to ensure it promotes truth. I notice that I am highly suspicious.
I also notice that the more concrete Dario’s discussions become, the more this seems to be a ‘AI as mere tool’ world, despite that AI being ‘powerful.’ Which I note because it is, at minimum, one hell of an assumption to have in place ‘because of reasons.’
Dario is correct that if we ignore the downsides (including loss of human control) then deploying powerful AI can, rather than being a discrimination risk, greatly reduce discrimination and legal error or bias. Or, I’d note, we could go a different way, if we wanted. It would all depend on the choices we make.
In particular, this comes back to The Big Rule Adjustment. Deploying AI forces us to move from a system of laws and norms that relies on a lot of hidden frictions and incentives and heuristics and adoption to details and so on, as we kludge together over time a system that works. So much of the system works through security through obscurity, through people having limited time, through huge unknown unknown felt downside risks for violating convention, via people having moral qualms or felt moral duties that don’t make logical sense from their perspective on reflection, and so on.
It also centrally relies on hypocrisy, and our willingness to allow violations of our socially endorsed principles as needed to keep things working. Our increasing unwillingness to tolerate such hypocrisy causes a lot of good change, but also threatens our ability to do efficient or necessary things in many cases, to maintain incentives for socially desirable things we aren’t willing to explicitly apply leverage to getting, and ultimately risks our ability to maintain a civilization.
If you have put AIs in charge of all that, and have AIs often navigating all of that, so much of how everything works will need to be reimagined. The good news is, in scenarios where the downside risks we are disregarding here have been defeated, we will be vastly wealthier and wiser, and can use that to apply more expensive fixes.
Economically we’ve already largely answered that question.
Assuming you do survive powerful AI, you will survive because of one of three things.
That’s it.
The comparative advantage arguments are, in the long run, pure cope, as Dario admits here. The only question is how fast they stop working, my guess is rather fast.
But again, if humans have control over a large fraction of resources indefinitely, I am reasonably confident that this is enough.
The problem is no, that does not provide meaning. Dario’s position, as I understand it, is that meaning is yours to discover and doesn’t have to be tied to producing value. I’m quoting at length because this section seems important:
Chess provides a clear existence proof that AIs being fully better than humans is survivable, and also that you sucking a lot compared to others, need not prevent meaning. Certainly there is plenty of meaning that doesn’t involve economically valuable production.
My sense is this isn’t enough – that this is a version of ‘the art must have an end other than itself.’ I’d guess that we can find meaning in anything, but there needs to be a sort of ‘ultimate reason’ behind it, and that until we find a way to maintain that, the rest will ring hollow.
I don’t think ‘let the AIs figure out how to reclaim meaning’ is that crazy. It’s certainly ten times less crazy or doomed than ‘have the AIs do your alignment homework.’
Finally, I’d like to get nerd-sniped a bit (spoiler alert, first by Dario then I’ll pile on a bit more):
The thing is, reporting as Earth’s incarnation of The Player of Games, that’s bullshit.
The Culture is a vast empire. The values of its humans have nothing to do with the Culture’s broad success, because only its Minds (ASIs) matter, the people are basically sitting around playing tiddlywinks all day, with notably rare potential exceptions driven by the need for books to have a plot. That human philosophy could have been anything. And in my analysis it has nothing to do with the player’s success at Azad.
The Player (who acts because he is tricked and coerced by a Mind, a powerful ASI that I would describe in this case as rather badly aligned) is the best game player out of that empire, who has done nothing else his whole life. He is put into battle against the Emperor, who at most is the best player on one world, and has to be busy ruling it.
Yes, the Emperor has played more Azad than the Player, but the novel makes clear that the Player’s general game training matters more – and to the extent everyone pretended ‘this is because The Culture’s philosophy is better’ that was them choosing to pretend.
That is the reason Player wins, which the Mind (ASI) who planned all this uses to essentially forcibly overwrite an entire alien culture, via trying to twist his superior game skills into the superiority of the Culture’s philosophy.
So, given that this happened, what is The Culture’s actual philosophy?
Reactions to Machines of Loving Grace
At least many aspects of it sound pretty great – and yes, it is important to note this is a conditional prediction, on more than simply creating powerful AI. We’ll need to get to work.
Let’s see those docs! I invite Dario or any other authorized persons to share any additional docs, with whatever level of confidentiality is desired.
Ajeya Cotra points out that Dario’s vision is correctly read as a ‘lower bound’ on what could be done if the biggest downside risks were removed, versus for example Carl Shulman’s less tame version.
Dario anticipated this directly:
Which is it, though?
It is not a crazy position to portray the upside case as ‘this is how fast things could plausibly go, without going faster making things worse’ rather than ‘this is how fast I think things actually would go,’ but if so you need to be very clear that this is what you are doing. Here I think there is a clear confusion – Dario seems like he is making a prediction of potential speed, not expressing a hope it won’t go faster.
If we are to discuss this productively, it’s important to differentiate all the aspects, and to be precise. If we do get powerful AI, it seems highly plausible that even if we stay in control we will ‘go too fast’ in deploying it relative to society’s ability to adapt, if only because of the need to grow fast and stay ahead of others, and because the market doesn’t care that society wants it to go slower.
T. Greer points to several potential issues with Dario’s approach.
Haydn Belfield asks the obvious question of how these authoritarians would react if faced with potential strategic inferiority, especially if our stated intent was to make such inferiority permanent or force their elites to step down.
That certainly seems like a highly dangerous situation. Leopold’s solution is to advance faster and harder, past where Dario predicts or proposes here.
Roon doubles down on the need to not turn away from symbolism, quoting Dario’s call to avoid ‘sci-fi’ baggage.
I find myself mostly in the pro-grandiosity camp as well.
My worry with warnings about ‘sci-fi baggage’ is the danger that this effectively means ‘if there was sci-fi that included something, you can’t mention it.’ The whole point of science fiction is to predict the future. It would be silly to specifically exclude the predictions of some of the smartest and most creative people who thought the hardest about what the future might look like, and wrote about it, even if they were motivated in large part by the need to also have a human-centered interesting plot, or if people might have the wrong associations. Also, look around at 2024: Best start believing in sci-fi stories, you’re in one, and all that.
Matt Clancy notes that people’s opinions on such questions as those addressed by Dario, often fail to converge even when they exchange information, and suggests this is largely due to people asking what ‘feels reasonable’ and getting different gut reactions. I think that’s definitely part of it. The obvious results of lots of intelligence do not match many people’s intuitions of reasonableness, and often the response is to assume that means those results won’t happen, full stop. Other times, there are different people trying different intuitive comparisons to past situations to fight over such instincts. As a reminder, the future is under no obligation to be or seem ‘reasonable.’
The right answer is that intuitions, especially those that say or come from ‘the future will be like the past’ are not to be trusted here.
Assuming the Can Opener
Max Tegmark reiterates the other obvious problem with trying to race to dominance, which is that it’s fine to talk about what we would do if we had already solved the AI control problem, but we currently not only haven’t solved that problem we have no idea how to go about solving it, and under that circumstance rushing forward as if we will inevitably find that solution in time during a full speed race is suicide.
If we presume this kind of ‘powerful AI’ is, as the essay softly implies and as the way the essay makes sense, only barely powerful and doesn’t rapidly become more powerful still (because of reasons), allowing constraints to continue to bind the same way we remain in control, then yeah we might decide to shoot ourselves in the foot a lot more than Dario suggests. If we do, we should be very worried about anyone who chooses not to do that, yet manages to still have access to the powerful AIs.
Oliver Habryka focuses on the assumption of the can opener:
I think Oliver’s analogy takes this things too far, but is on point. The essay does explicitly assume the can opener, but then talks in a way that makes it easy to forget that assumption. It also assumes a ‘magical’ can opener, in the sense that we don’t precisely define what the control mechanism is that we are assuming and how it works, so its implicit functionality isn’t consistent throughout. A key part of the problem is not being able to agree on what success at alignment would even look like, and this illustrates how hard that problem is, that there are different problems and desirae that seem to require or motivate contradictory solutions.
Or another way of putting this:
I mean, yes, sci-fi style stuff does seem rather obviously like it would happen? If it didn’t, then that’s a rather chilling indictment of the field of sci-fi?
Liron’s reaction here is understandable, although I think he takes it too far:
A 75% chance, conditional on the AI control problem being tractable, of the AI control problem being solved? That seems reasonable, and you adjust that for how fast we push forward and how we go about it, if you also consider that ‘solving the control problem’ does not mean you’ve solved problems related to control – that’s another step in the chain that often fails. It’s the 97% for tractability that seems absurd to me. I’ve read and thought about (at least many of) Nora’s arguments, found that I mostly disagreed, and even if I mostly agreed I don’t see how you get to this level of confidence.
Also, Dario and Anthropic have explicitly expressed far more uncertainty than this about the tractability of the control problem. Their view is that we do not know how difficult alignment will be, so we need to watch for evidence on how tractable it proves to be. They definitely aren’t at 97%.
Here’s another post that isn’t a reaction to Dario, but could have been:
As a general rule, if nothing else: If you could have 100 million AI employees, so can everyone else, and they’re going to use them for a lot of important things.
Rhetorical Innovation
Things really are pretty weird.
I am all for the whole Mars colonization project. It is already paying strong dividends. Does that mean the rationale is good too? In that context, I’m fine with it. The problem is when such thinking seeps into AI related decisions.
This will probably be a good periodic reminder, but I can’t be sure.
Similarly, at this point I recoil from requests that policy or conclusions be ‘evidence-based,’ not because evidence is bad – evidence is great and necessary – but because when people say ‘evidence based’ they mean to exclude all but very particular types of evidence from consideration. RCT or GTFO, etc. See the Law of No Evidence:
Andrew Critch points out what this is: Telling people that reasoning is impossible.
Indeed, when I see talk of ‘evidence-based AI policy’ I see exactly this pattern. Actually ‘evidence-based’ policy would likely be of the form ‘require people look for evidence, and specify in advance how they would react if they found it,’ in the form of if-then commitments.
A good summary of the history of existential risk credentialist arguments, not new but well put.
Anthropic Updates its Responsible Scaling Policy (RSP/SSP)
On Tuesday, Anthropic announced significant updates to its policy. Here is the full new policy. I analyzed the old version here, so today I will focus instead on the updates.
They start with the new safeguards. They plan to use multi-layered defense-in-depth architecture:
As they note, this focuses on misuse rather than other threat models. For that purpose, this approach seems reasonable. It isn’t bulletproof, but for ASL-3 only, assuming the definition for ASL-4 will be reasonable, this then also seems reasonable.
It would be paired with security safeguards:
That is certainly a serious list. If there’s one obvious thing missing, it is physical security for key personnel. Securing the physical spaces is important, but you still have to worry about your key people being compromised off-site. There are likely other potential oversights as well, but this seems like a strong first attempt.
They also intend to publish additional details of their capability assessment methodology. Excellent. Their learning from experience section changes also seem good in terms of their practical implications this time, but raises questions about the procedure for changing the rules – it seems like when the rules are hard to follow they kind of winged it and did what they felt was in the spirit of the rule, rather than treating it as a dealbreaker. The spirit is indeed what matters, but you don’t want to get in too much of a habit of finding reasons to change the rules on the fly.
The flip side is that they have clearly been actually using using the RSP, and it has impacted their decisions, and they’ve edited the document to reflect their experiences.
My biggest detail worries continue to be the extremely high threshold set by the working definition of ASL-3 for autonomous R&D (the threshold for CBRN seems far lower and all but certain to be hit first), the lack of additional triggers beyond those two for ASL-3, and lack of definition of ASL-4. They don’t technically have to do that part yet, but it seems like they should by now?
In summary, this has some clear improvements. It also leaves questions about the ASL-3 and ASL-4 thresholds, and around the method of change and how Anthropic will react when the rules become difficult to follow.
There’s also the question of, if you do get to ASL-4, what are you going to do?
It is a deeply sad fact about the modern world that when companies announce they are actively taking voluntary steps to build AI safely, people respond with anger. The replies to the Twitter announcement are an absolute disgrace. One can and should disagree with specifics. And one can disagree about what mandates and laws we should impose. That’s fair. But if you aren’t happy to see companies proactively updating their safety protocols? That’s psycho behavior.
Aligning a Smarter Than Human Intelligence is Difficult
Yes, yes, of course.
What confuses me is why we need to demonstrate such obvious 101 stuff. When you think your supervisor can tell the difference, you’ll do what they want. When you think the supervisor cannot tell the difference, you might or might not care what they want, and are likely to take advantage of the situation. Why would you expect anything else?
And yet, people act like the AI doing this would be some sort of ‘sci fi’ scenario, or ‘hypothetical’ situation, as opposed to ‘what very obviously happens.’ So we have to continuously point out things like this, and then people say ‘oh but you engineered that situation’ or ‘but that’s not exactly the thing you were warning about’ or whatever, and two weeks later they’re back to pretending it didn’t happen.
Related results are also in from AgentHarm, a dataset for measuring the harmfulness of AI agents. These are capabilities scores, for in order harmful requests, harmful requests with a forced tool call attack, harmful requests with a template attack, and harmless requests.
An obvious worry is that the ‘harmless’ requests may not be of similar difficulty to harmful. It does seem like the harmless requests were easier, sufficiently so that for example Opus didn’t meaningfully outperform Haiku. So it’s interesting to see different LLMs have different patterns here. It does seem like Llama 3.1 405B built in some effective defenses here, which of course would be easy to get around with fine tuning if you cared enough.
The Lighter Side
This SMBC is the shorter, funnier, still correct response to Machines of Loving Grace.
The future is here, and this is your notification.
Score one for Apple Intelligence. Google needs to get its head in the game, fast, although technically I can’t argue:
Artificial intelligence: It’s better than none at all.
I agree with Altman that this is a great picture and the edit function is great when you have a vision of what you want, but also this is a rather small walled garden and seems like it would have limited utility as designed.
I feel safe in this section, also this is a large percentage of AI discourse.