The alternative hypothesis does need to be said, especially after someone at a party outright claimed it was obviously true, and with the general consensus that the previous export controls were not all that tight. That alternative hypothesis is that DeepSeek is lying and actually used a lot more compute and chips it isn’t supposed to have. I can’t rule it out.
Re DeepSeek cost-efficiency, we are seeing more claims pointing in that direction.
In a similarly unverified claim, the founder of 01.ai (who is sufficiently known in the US according to https://en.wikipedia.org/wiki/Kai-Fu_Lee) seems to be claiming that the training cost of their Yi-Lightning model is only 3 million dollars or so. Yi-Lightning is a very strong model released in mid-Oct-2024 (when one compares it to DeepSeek-V3, one might want to check "math" and "coding" subcategories on https://lmarena.ai/?leaderboard; the sources for the cost claim are https://x.com/tsarnick/status/1856446610974355632 and https://www.tomshardware.com/tech-industry/artificial-intelligence/chinese-company-trained-gpt-4-rival-with-just-2-000-gpus-01-ai-spent-usd3m-compared-to-openais-usd80m-to-usd100m, and we probably should similarly take this with a grain of salt).
But all this does seem to be well within what's possible. Here is the famous https://github.com/KellerJordan/modded-nanogpt ongoing competition, and it took people about 8 months to accelerate Andrej Karpathy's PyTorch GPT-2 trainer from llm.c by 14x on a 124M parameter GPT-2 (what's even more remarkable is that almost all that acceleration is due to better sample efficiency with the required training data dropping from 10 billion tokens to 0.73 billion tokens on the same training set with the fixed order of training tokens).
Some of the techniques used by the community pursuing this might not scale to really large models, but most of them probably would scale (as we see in their mid-Oct experiment demonstrating scaling of what has been 3-4x acceleration back then to the 1.5B version).
So when an org is claiming 10x-20x efficiency jump compared to what it presumably took a year or more ago, I am inclined to say, "why not, and probably the leaders are also in possession of similar techniques now, even if they are less pressed by compute shortage".
The real question is how fast will these numbers continue to go down for the similar levels of performance... It's has been very expensive to be the very first org achieving a given new level, but the cost seems to be dropping rapidly for the followers...
it took people about 8 months to accelerate Andrej Karpathy's PyTorch GPT-2 trainer from llm.c by 14x on a 124M parameter GPT-2
The baseline is weak, the 8 months is just catching up to the present. They update the architecture (giving maybe a 4x compute multiplier), shift to a more compute optimal tokens/parameter ratio (1.5x multiplier). Maybe there is another 2x from the more obscure changes (which are still in the literature, so the big labs have the opportunity to measure how useful they are, select what works).
It's much harder to improve on GPT-4 or Llama-3 that much.
what's even more remarkable is that almost all that acceleration is due to better sample efficiency with the required training data dropping from 10 billion tokens to 0.73 billion tokens on the same training set with the fixed order of training tokens
That's just in the rules of the game, the number of model parameters isn't allowed to change, so in order to reduce training FLOPs (preserving perplexity) they reduce the amount of data. It also incidentally improves optimality of tokens/parameter ratio, though at 0.73B tokens it already overshoots, turning the initial overtrained 10B token model into a slightly undertrained model.
We’re not even preparing reasonably for the mundane things that current AIs can do, in either the sense of preparing for risks, or in the sense of taking advantage of its opportunities. And almost no one is giving much serious thought to what the world full of AIs will actually look like and what version of it would be good for humans, despite us knowing such a world is likely headed our way.
Is there any good post on what to do? Preferrably aimed for a casual person who just use ChatGPT 1-2 times a month
The fun, as it were, is presumably about to begin.
And the break was fun while it lasted.
Biden went out with an AI bang. His farewell address warns of a ‘Tech-Industrial Complex’ and calls AI the most important technology of all time. And there was not one but two AI-related everything bagel concrete actions proposed – I say proposed because Trump could undo or modify either or both of them.
One attempts to build three or more ‘frontier AI model data centers’ on federal land, with timelines and plans I can only summarize with ‘good luck with that.’ The other move was new diffusion regulations on who can have what AI chips, an attempt to actually stop China from accessing the compute it needs. We shall see what happens.
Table of Contents
Language Models Offer Mundane Utility
Help dyslexics get around their inability to spell to succeed in school, and otherwise help kids with disabilities. Often, we have ways to help everyone, but our civilization is willing to permit them for people who are ‘behind’ or ‘disadvantaged’ or ‘sick’ but not to help the average person become great – if it’s a problem everyone has, how dare you try to solve it. Well, you do have to start somewhere.
Diagnose medical injuries. Wait, Elon Musk, maybe don’t use those exact words?
The original story that led to that claim is here from AJ Kay. The doctor and radiologist said her daughter was free of breaks, Grok found what it called an ‘obvious’ fracture line, they went to a wrist specialist, who found it, confirmed it was obvious and cast it, which they say likely avoided a surgery.
Used that way LLMs seem insanely great versus doing nothing. You use them as an error check and second opinion. If they see something, you go follow up with a doctor to verify. I’d go so far as to say that if you have a diagnostic situation like this and you feel any uncertainty, and you don’t do at least this, that seems irresponsible.
A suggested way to prompt o1 (and o1 Pro especially):
Another strategy is to first have a conversation with Claude Sonnet, get a summary, and use it as context (Rohit also mentions GPT-4o, which seems strictly worse here but you might not have a Claude subscription). This makes a lot of sense, especially when using o1 Pro.
Alternate talking with o1 and Sonnet when talking through ideas, Gallabytes reports finding this helpful.
The streams are crossing, Joe Weisenthal is excited that Claude can run and test out its own code for you.
People on the internet sometimes lie, especially about cheating, film at 11. But also the future is highly evenly distributed, and hearing about something is different from appreciating it.
Never? No, never. What, never? Well, actually all the time.
I also find it hard to believe that students are this slow, especially given this is a very low bar – it’s whether you even once asked for ‘help’ at all, in any form. Whereas ChatGPT has 300 million users.
When used properly, LLMs are clearly amazingly great at education.
What does that say about ‘typical learning’? A revolution is coming.
Language Models Don’t Offer Mundane Utility
Sully suggests practical improvements for Claude’s web app to increase engagement. Agreed that they should improve artifacts and include a default search tool. The ability to do web search seems super important. The ‘feel’ issue he raises doesn’t bother me.
Use a [HIDDEN][/HIDDEN] tag you made up to play 20 questions with Claude, see what happens.
Straight talk: Why do AI functions of applications like GMail utterly suck?
I know! What’s up with that?
I remember very much expecting this sort of thing to be a big deal, then the features sort of showed up but they are so far universally terrible and useless.
I’m going to go ahead and predict that at least the scheduling problem will change in 2025 (although one can ask why they didn’t do this feature in 2015). As in, if you have an email requesting a meeting, GMail will offer you an easy way (a button, a short verbal command, etc) to get an AI to do the meeting scheduling for you, at minimum drafting the email for you, and probably doing the full stack back and forth and creating the eventual event, with integration with Google Calendar and a way of learning your preferences. This will be part of the whole ‘year of the agent’ thing.
For the general issue, it’s a great question. Why shouldn’t GMail be drafting your responses in advance, at least if you have a subscription that pays for the compute and you opt in, giving you much better template responses, that also have your context? Is it that hard to anticipate the things you might write?
I mostly don’t want to actually stop to tell the AI what to write at current levels of required effort – by the time I do that I might as well have written it. It needs to get to a critical level of usefulness, then you can start customizing and adapting from there.
If 2025 ends and we still don’t have useful features of these types, we’ll want to rethink.
What we don’t have are good recommendation engines, even locally, certainly not globally.
Letterboxd doesn’t even give you predictions for your rating of other films, seriously, what is up with that?
What AI Skepticism Often Looks Like
That sign: NewScientist comes home (on January 2, 2025):
New Scientist: Multiple experiments showed that four leading large language models often failed in patient discussions to gather complete histories, the best only doing so 71% of the time, and even then they did not always get the correct diagnosis.
New Scientist’s Grandmother: o1, Claude Sonnet and GPT-4o, or older obsolete models for a paper submitted in August 2023?
New Scientist, its head dropping in shame: GPT-3.5 and GPT-4, Llama-2-7B and Mistral-v2-7B for a paper submitted in August 2023.
Also there was this encounter:
New Scientist, looking like Will Smith: Can an AI always get a complete medical history and the correct diagnosis from talking to a patient?
GPT-4 (not even 4o): Can you?
New Scientist: Time to publish!
It gets better:
I love the whole ‘holistic judgment means we should overrule the AI with human judgment even though the studies are going to find that doing this makes outcomes on average worse’ which is where we all know that is going. And also the ‘sure it will do [X] better but there’s some other task [Y] and it will never do that, no no!’
The core idea here is actually pretty good – that you should test LLMs for real medical situations by better matching real medical situations and their conditions. They do say the ‘patient AI’ and ‘grader AI’ did remarkably good jobs here, which is itself a test of AI capabilities as well. They don’t seem to offer a human baseline measurement, which seems important to knowing what to do with all this.
And of course, we have no idea if there was opportunity to radically improve the results with better prompt engineering.
I do know that I predict that o3-mini or o1-pro, with proper instructions, will match or exceed human baseline (the median American practicing doctor) for gathering a complete medical history. And I would expect it to also do so for diagnosis.
I encourage one reader to step up, email them for the code (the author emails are listed in the paper) and then test at least o1.
A Very Expensive Chatbot
This is Aria, their flagship AI-powered humanoid robot ‘with a social media presence’. Part 2 of the interview here. It’s from Realbotix. You can get a ‘full bodied robot’ starting at $175,000.
They claim that social robots will be even bigger than functional robots, and aim to have their robots not only ‘learn about and help promote your brand’ but also learn everything about you and help ‘with the loneliness epidemic among adolescents and teenagers and bond with you.’
And yes they use the ‘boyfriend or girlfriend’ words. You can swap faces in 10 seconds, if you want more friends or prefer polyamory.
It has face and voice recognition, and you can plug in whatever AI you like – they list Anthropic, OpenAI, DeepMind, Stability and Meta on their website.
It looks like this:
Its movements in the video are really weird, and worse than not moving at all if you exclude the lips moving as she talks. They’re going to have to work on that.
Yes, we all know a form of this is coming, and soon. And yes, these are the people from Whitney Cummings’ pretty funny special Can I Touch It? so I can confirm that the answer to ‘can I?’ can be yes if you want it to be.
But for Aria the answer is no. For a yes and true ‘adult companionship’ you have to go to their RealDoll subdivision. On the plus side, that division is much cheaper, starting at under $10k and topping out at ~$50k.
I had questions, so I emailed their press department, but they didn’t reply.
My hunch is that the real product is the RealDoll, and what you are paying the extra $100k+ for with Aria is a little bit extra mobility and such but mostly so that it does have those features so you can safely charge it to your corporate expense account, and perhaps so you and others aren’t tempted to do something you’d regret.
Deepfaketown and Botpocalypse Soon
Pliny the Liberator claims to have demonstrated a full-stack assassination agent, that would if given funds have been capable of ‘unaliving people,’ with Claude Sonnet 3.6 being willing to select real world targets.
Introducing Astral, an AI marketing AI agent. It will navigate through the standard GUI websites like Reddit and soon TikTok and Instagram, and generate ‘genuine interactions’ across social websites to promote your startup business, in closed beta.
It is taking longer than I expected for this type of tool to emerge, but it is coming. This is a classic situation where various frictions were preserving our ability to have nice things like Reddit. Without those frictions, we are going to need new ones. Verified identity or paid skin in the game, in some form, is the likely outcome.
Out with the old, in with the new?
Instagram ads are the source of 90% of traffic for a nonconsensual nudity app Crushmate or Crush AI, with the ads themselves featuring such nonconsensual nudity of celebrities such as Sophie Rain. I did a brief look-see at the app’s website. They have a top scroll saying ‘X has just purchased’ which is what individual struggling creators do, so it’s probably 90% of not very much, and when you’re ads driven you choose where the ads go. But it’s weird, given what other ads don’t get approved, that they can get this level of explicit past the filters. The ‘nonconsensual nudity’ seems like a side feature of a general AI-image-and-spicy-chat set of offerings, including a number of wholesome offerings too.
AI scams are still rare, and mostly get detected, but it’s starting modulo the lizardman constant issue:
Richard Hanania notes that the bot automatic social media replies are getting better, but says ‘you can still something is off here.’ I did not go in unanchored, but this does not seem as subtle as he makes it out to be, his example might as well scream AI generated:
My prior on ‘that’s AI’ is something like 75% by word 4, 95%+ after the first sentence. Real humans don’t talk like that.
I also note that it seems fairly easy to train an AI classifier to do what I instinctively did there, and catch things like this with very high precision. If it accidentally catches a few college undergraduates trying to write papers, I notice my lack of sympathy.
But that’s a skill issue, and a choice. The reason Aiman’s response is so obvious is that it has exactly that RLHF-speak. One could very easily fine tune in a different direction, all the fine tuning on DeepSeek v3 was only five figures in compute and they give you the base model to work with.
I do continue to expect things to move in that direction, but I also continue to expect there to be ways to bootstrap. If nothing else, there is always money. This isn’t flawless, as Elon Musk as found out with Twitter, but it should work fine, so long as you reintroduce sufficient friction and skin in the game.
Fun With Image Generation
The ability to elicit the new AI generated song Six Weeks from AGI causes Steve Sokolowski to freak out about potential latent capabilities in other AI models. I find it heavily mid to arrive at this after a large number of iterations and amount of human attention, especially in terms of its implications, but I suppose it’s cool you can do that.
They Took Our Jobs
Daron Acemoglu is economically highly skeptical of and generally against AI. It turns out this isn’t about the A, it’s about the I, as he offers remarkably related arguments against H-1B visas and high skilled human immigration.
The arguments here are truly bizarre. First he says if we import people with high skills, then this may prevent us from training our own people with high skills, And That’s Terrible. Then he says, if we import people with high skills, we would have more people with high skills, And That’s Terrible as well because then technology will change to favor high-skilled workers. Tyler Cowen has o1 and o1 pro respond, as a meta-commentary on what does and doesn’t constitute high skill these days.
Noah Smith, Tyler Cowen and o1 are all highly on point here.
In terms of AI actually taking our jobs, Maxwell Tabarrok reiterates his claim that comparative advantage will ensure human labor continues to have value, no matter how advanced and efficient AI might get, because there will be a limited supply of GPUs, datacenters and megawatts, and advanced AIs will face constraints, even if they could do all tasks humans could do more efficiently (in some senses) than we can.
I actually really like Maxwell’s thread here, because it’s a simple, short, clean and within its bounds valid version of the argument.
His argument successfully shows that, absent transaction costs and the literal cost of living, assuming humans have generally livable conditions with the ability to protect their private property and engage in trade and labor, and given some reasonable additional assumptions not worth getting into here, human labor outputs will retain positive value in such a world.
He shows this value would likely converge to some number higher than zero, probably, for at least a good number of people. It definitely wouldn’t be all of them, since it already isn’t, there are many ZMP (zero marginal product) workers you wouldn’t hire at $0.
Except we have no reason to think that number is all that much higher than $0. And then you have to cover not only transaction costs, but the physical upkeep costs of providing human labor, especially to the extent those inputs are fungible with AI inputs.
Classically, we say ‘the AI does not love, you the AI does not hate you, but you are made of atoms it can use for something else.’ In addition to the atoms that compose you, you require sustenance of various forms to survive, especially if you are to live a life of positive value, and also to include all-cycle lifetime costs.
Yes, in such scenarios, the AIs will be willing to pay some amount of real resources for our labor outputs, in trade. That doesn’t mean this amount will be enough to pay for the imports to those outputs. I see no reason to expect that it would clear the bar of the Iron Law of Wages, or even near term human upkeep.
This is indeed what happened to horses. Marginal benefit mostly dropped below marginal cost, the costs to maintain horses were fungible with paying costs for other input factors, so quantity fell off a cliff.
Seb Krier says a similar thing in a different way, noticing that AI agents can be readily cloned, so at the limit for human labor to retain value you need to be sufficiently compute constrained that there are sufficiently valuable tasks left for humans to do. Which in turn relies on non-fungibility of inputs, allowing you to take the number of AIs and humans as given.
For humans to continue to be able to survive, they need to pay for themselves. In these scenarios, doing so off of labor at fair market value seems highly unlikely. That doesn’t mean the humans can’t survive. As long as humans remain in control, this future society is vastly wealthier and can afford to do a lot of redistribution, which might include reserving fake or real jobs and paying non-economic wages for them. It’s still a good thing, I am not against all this automation (again, if we can do so while retaining control and doing sufficient redistribution). The price is still the price.
One thing AI algorithms never do is calculate p-values, because why would they?
The Blame Game
The Verge’s Richard Lawler reports that Las Vegas police have released ChatGPT logs from the suspect in the Cybertruck explosion. We seem to have his questions but not the replies.
It seems like… the suspect used ChatGPT instead of Google, basically?
Here’s the first of four screenshots:
When you look at the questions he asked, it is pretty obvious he is planning to build a bomb, and an automated AI query that (for privacy reasons) returned one bit of information would give you that information without many false positives. The same is true of the Google queries of many suspects after they get arrested.
None of this is information that would have been hard to get via Google. ChatGPT made his life modestly easier, nothing more. I’m fine with that, and I wouldn’t want ChatGPT to refuse such questions, although I do think ‘we can aspire to do better’ here in various ways.
And in general, yes, people like cops and reporters are way too quick to point to the tech involved, such as ChatGPT, or to the cybertruck, or the explosives, or the gun. Where all the same arguments are commonly made, and are often mostly or entirely correct.
But not always. It is common to hear highly absolutist responses, like the one by Purnell above, that regulation of technology ‘will not stop bad actors’ and thus would have no effect. That is trying to prove too much. Yes, of course you can make life harder for bad actors, and while you won’t stop all of them entirely and most of the time it totally is not worth doing, you can definitely reduce your expected exposure.
This example does provide a good exercise, where hopefully we can all agree this particular event was fine if not ideal, and ask what elements would need to change before it was actively not fine anymore (as opposed to ‘we would ideally like you to respond noticing what is going on and trying to talk him out of it’ or something). What if the device was non-conventional? What if it more actively helped him engineer a more effective device in various ways? And so on.
Copyright Confrontation
Zuckerberg signed off on Meta training on copyrighted works, oh no. Also they used illegal torrent to download works for training, which does seem not so awesome I suppose, but yes of course everyone is training on all the copyrighted works.
The Six Million Dollar Model
What is DeepSeek v3’s secret? Did they really train this thing for $5.5 million?
China Talk offers an analysis. The answer is: Yes, but in other ways no.
The first listed secret is that DeepSeek has no business model. None. We’re talking about sex-in-the-champaign-room levels of no business model. They release models, sure, but not to make money, and also don’t raise capital. This allows focus. It is classically a double edged sword, since profit is a big motivator, and of course this is why DeepSeek was on a limited budget.
The other two secrets go together: They run their own datacenters, own their own hardware and integrate all their hardware and software together for maximum efficiency. And they made this their central point of emphasis, and executed well. This was great at pushing the direct quantities of compute involved down dramatically.
The trick is, it’s not so cheap or easy to get things that efficient. When you rack your own servers, you get reliability and confidentiality and control and ability to optimize, but in exchange your compute costs more than when you get it from a cloud service.
Since they used H800s, not H100s you’ll need to adjust that, but the principle is similar. Then you have to add on the cost of the team and its operations, to create all these optimizations and reach this point. Getting the core compute costs down is still a remarkable achievement, and raises big governance questions and challenges whether we can rely on export controls. Kudos to all involved. But this approach has its own challenges.
The alternative hypothesis does need to be said, especially after someone at a party outright claimed it was obviously true, and with the general consensus that the previous export controls were not all that tight. That alternative hypothesis is that DeepSeek is lying and actually used a lot more compute and chips it isn’t supposed to have. I can’t rule it out.
Get Involved
Survival and Flourishing Fund is hiring a Full-Stack Software Engineer.
Anthropic’s Alignment Science team suggests research directions. Recommended.
We’re getting to the end of the fundraiser for Lightcone Infrastructure, and they’re on the bubble of where they have sufficient funds versus not. You can donate directly here.
Introducing
A very basic beta version of ChatGPT tasks, or according to my 4o instance GPT-S, I presume for scheduler. You can ask it to schedule actions in the future, either once or recurring. It will provide the phone notifications. You definitely weren’t getting enough phone notifications.
It looks like they did this via scheduling function calls based on the iCal VEVENT format, claimed instruction set here. Very basic stuff.
In all seriousness, incorporating a task scheduler by itself, in the current state of available other resources, is a rather limited tool. You can use it for reminders and timers, and perhaps it is better than existing alternatives for that. You can use it to ‘generate news briefing’ or similarly check the web for something. When this gets more integrations, and broader capability support over time, that’s when this gets actually interesting.
The initial thing that might be interesting right away is to do periodic web searches for potential information, as a form of Google Alerts with more discernment. Perhaps keep an eye on things like concerts and movies playing in the area. The basic problem is that right now this new assistant doesn’t have access to many tools, and it doesn’t have access to your context, and I expect it to flub complicated tasks.
GPT-4o agreed that most of the worthwhile uses require integrations that do not currently exist.
For now, the product is not reliably causing tasks to fire. That’s an ordinary first-day engineering problem that I assume gets fixed quickly, if it hasn’t already. But until it can do more complex things or integrate the right context automatically, ideally both, we don’t have much here.
I would note that you mostly don’t need to test the task scheduler by scheduling a task. We can count on OpenAI to get ‘cause this to happen at time [X]’ correct soon enough. The question is, can GPT-4o do [X] at all? Which you can test by telling it to do [X] now.
Reddit Answers, an LLM-based search engine. Logging in gets you 20 questions a day.
ExoRoad, a fun little app where you describe your ideal place to live and it tells you what places match that.
Lightpage, a notes app that then uses AI that remembers all of your notes and prior conversations. And for some reason it adds in personalized daily inspiration. I’m curious to see such things in action, but the flip side of the potential lock-in effects are the startup costs. Until you’ve taken enough notes to give this context, it can’t do the task it wants to do, so this only makes sense if you don’t mind taking tons of notes ‘out of the gate’ without the memory features, or if it could import memory and context. And presumably this wants to be a Google, Apple or similar product, so the notes integrate with everything else.
Shortwave, an AI email app which can organize and manage your inbox.
Writeup of details of WeirdML, a proposed new benchmark I’ve mentioned before.
In Other AI News
Summary of known facts in Suchir Balaji’s death, author thinks 96% chance it was a suicide. The police have moved the case to ‘Open and Active Investigation.’ Good. If this wasn’t foul play, we should confirm that.
Nothing to see here, just OpenAI posting robotics hardware roles to ‘build our robots.’
Marc Andreessen has been recruiting and interviewing people for positions across the administration including at DoD (!) and intelligence agencies (!!). To the victor go the spoils, I suppose.
Nvidia to offer $3,000 personal supercomputer with a Blackwell chip, capable of running AI models up to 200B parameters.
An ‘AI hotel’ and apartment complex is coming to Las Vegas in May 2025. Everything works via phone app, including door unlocks. Guests get onboarded and tracked, and are given virtual assistants called e-butlers, to learn guest preferences including things like lighting and temperature, and give guests rooms (and presumably other things) that match their preferences. They then plan to expand the concept globally, including in Dubai. Prices sound steep, starting with $300 a night for a one bedroom. What will this actually get you? So far, seems unclear.
I see this as clearly going in a good direction, but I worry it isn’t ready. Others see it as terrible that capitalism knows things about them, but in most contexts I find that capitalism knowing things about me is to my benefit, and this seems like an obvious example and a win-win opportunity, as Ross Rheingans-Yoo notes?
I’ve talked about it previously, but I want full blackout at night, either true silence or convenient white noise that fixes this, thick pillows and blankets, lots of chargers, a comfortable chair and desk, an internet-app-enabled TV and some space in a refrigerator and ability to order delivery right to the door. If you want to blow my mind, you can have a great multi-monitor setup to plug my laptop into and we can do real business.
Aidan McLau joins OpenAI to work on model design, offers to respond if anyone has thoughts on models. I have thoughts on models.
To clarify what OpenAI employees are often saying about superintelligence (ASI): No, they are not dropping hints that they currently have ASI internally. They are saying that they know how to build ASI internally, and are on a path to soon doing so. You of course can choose the extent to which you believe them.
Quiet Speculations
Ethan Mollick writes Prophecies of the Flood, pointing out that the three major AI labs all have people shouting from the rooftops that they are very close to AGI and they know how to build it, in a way they didn’t until recently.
As Ethan points out, we are woefully unprepared. We’re not even preparing reasonably for the mundane things that current AIs can do, in either the sense of preparing for risks, or in the sense of taking advantage of its opportunities. And almost no one is giving much serious thought to what the world full of AIs will actually look like and what version of it would be good for humans, despite us knowing such a world is likely headed our way. That’s in addition to the issue that these future highly capable systems are existential risks.
Gary Marcus predictions for the end of 2025, a lot are of the form ‘[X] will continue to haunt generative AI’ without reference to magnitude. Others are predictions that we won’t cross some very high threshold – e.g. #16 is ‘Less than 10% of the workforce will be replaced by AI, probably less than 5%,’ notice how dramatically higher a bar that is than for example Tyler Cowen’s 0.5% RGDP growth and this is only in 2025.
His lower confidence predictions start to become aggressive and specific enough that I expect them to often be wrong (e.g. I expect a ‘GPT-5 level’ model no matter what we call that, and I expect AI companies to outperform the S&P and for o3 to see adaptation).
Eli Lifland gives his predictions and evaluates some past ones. He was too optimistic on agents being able to do routine computer tasks by EOY 2024, although I expect to get to his thresholds this year. While all three of us agree that AI agents will be ‘far from reliable’ for non-narrow tasks (Gary’s prediction #9) I think they will be close enough to be quite useful, and that most humans are ‘not reliable’ in this sense.
He’s right of course, and this actually did update me substantially on o3?
The scary thing about not knowing is the right tail where something like o3 is better than you think it is. This is saying, essentially, that this isn’t the case? For now.
Please take the very consistently repeated claims from the major AI labs about both the promise and danger of AI both seriously and literally. They believe their own hype. That doesn’t mean you have to agree with those claims. It is very reasonable to think these people are wrong, on either or both counts, and they are biased sources. I am however very confident that they themselves believe what they are saying in terms of expected future AI capabilities, and when they speak about AI existential risks. I am also confident they have important information that you and I do not have, that informs their opinions.
This of course does not apply to claims regarding a company’s own particular AI application or product. That sort of thing is always empty hype until proven otherwise.
Via MR, speculations on which traits will become more versus less valuable over time. There is an unspoken background assumption here that mundane-AI is everywhere and automates a lot of work but doesn’t go beyond that. A good exercise, although I am not in agreement on many of the answers even conditional on that assumption. I especially worry about conflation of rarity with value – if doing things in real life gets rare or being skinny becomes common, that doesn’t tell you much about whether they rose or declined in value. Another throughput line here is an emphasis on essentially an ‘influencer economy’ where people get value because others listen to them online.
Davidad revises his order-of-AI-capabilities expectations.
There’s a lot of disagreement about order of operations here.
That’s especially true on persuasion. A lot of people think persuasion somehow tops off at exactly human level, and AIs won’t ever be able to do substantially better. The human baseline for persuasion is sufficiently low that I can’t convince them otherwise, and they can’t even convey to me reasons for this that make sense to me. I very much see AI super-persuasion as inevitable, but I’d be very surprised by Davidad’s order of this coming in a full form worthy of its name before the others.
A lot of this is a matter of degree. Presumably we get a meaningful amount of all the three non-coup things here before we get the ‘final form’ or full version of any of them. If I had to pick one thing to put at the top, it would probably be cyber.
The ‘overt coup’ thing is a weird confusion. Not that it couldn’t happen, but that most takeover scenarios don’t work like that and don’t require it, I’m choosing not to get more into that right here.
There is no 6 listed, which makes me love this Tweet.
Gwern points out that if models like first o1 and then o3, and also the unreleased Claude Opus 3.6, are used primarily to create training data for other more distilled models, the overall situation still looks a lot like the old paradigm. You put in a ton of compute to get first the new big model and then to do the distillation and data generation. Then you get the new smarter model you want to use.
The biggest conceptual difference might be that to the extent the compute used is inference, this allows you to use more distributed sources of compute more efficiently, making compute governance less effective? But the core ideas don’t change that much.
I also note that everyone is talking about synthetic data generation from the bigger models, but no one is talking about feedback from the bigger models, or feedback via deliberation of reasoning models, especially in deliberate style rather than preference expression. Especially for alignment but also for capabilities, this seems like a big deal? Yes, generating the right data is important, especially if you generate it where you know ‘the right answer.’ But this feels like it’s missing the true potential on offer here.
This also seems on important:
It is easy to see how Daniel could be right that process-based creates unfaithfulness in the CoT, it would do that by default if I’m understanding this right, but it does not seem obvious to me it has to go that way if you’re smarter about it, and set the proper initial conditions and use integrated deliberate feedback.
(As usual I have no idea where what I’m thinking here lies on ‘that is stupid and everyone knows why it doesn’t work’ to ‘you fool stop talking before someone notices.’)
If you are writing today for the AIs of tomorrow, you will want to be thinking about how the AI will internalize and understand and learn from what you are saying. There are a lot of levels on which you can play that. Are you aiming to imbue particular concepts or facts? Trying to teach it about you in particular? About modes of thinking or moral values? Get labels you can latch onto later for magic spells and invocations? And perhaps most neglected, are you aiming for near-term AI, or future AIs that will be smarter and more capable, including having better truesight? It’s an obvious mistake to try to pander to or manipulate future entities smart enough to see through that. You need to keep it genuine, or they’ll know.
The post in Futurism here by Jathan Sadowski can only be described as bait, and not very well reasoned bait, shared purely for context for Dystopia’s very true response, and also because the concept is very funny.
La la la not listening, can’t hear you. A classic strategy.
Man With a Plan
UK PM Keir Starmer has come out with a ‘blueprint to turbocharge AI.’
His attitude towards existential risk from AI is, well, not good:
That’s pretty infuriating. To refer to ‘fears of’ a ‘small risk’ and act as if this situation is typical of new technologies, and use that as your entire logic for why your plan essentially disregards existential risk entirely.
It seems more useful, though, to take the recommendations as what they are, not what they are sold as. I don’t actually see anything here that substantially makes existential risk worse, except insofar as it is a missed opportunity. And the actual plan author, Matt Clifford, shows signs he does understand the risks.
So do these 50 implemented recommendations accomplish what they set out to do?
If someone gives you 50 recommendations, and you adapt all 50, I am suspicious that you did critical thinking about the recommendations. Even ESPN only goes 30 for 30.
I also worry that if you have 50 priorities, you have no priorities.
What are these recommendations? The UK should spend more money, offer more resources, create more datasets, develop more talent and skills, including attracting skilled foreign workers, fund the UK AISI, have everyone focus on ‘safe AI innovation,’ do ‘pro-innovation’ regulatory things including sandboxes, ‘adopt a scan>pilot>scale’ approach in government and so on.
The potential is… well, actually they think it’s pretty modest?
The central themes are ‘laying foundations for AI to flourish in the UK,’ ‘boosting adaptation across public and private sectors,’ and ‘keeping us head of the pack.’
To that end, we’ll have ‘AI growth zones’ in places like Culham, Oxfordshire. We’ll have public compute capacity. And Matt Clifford (the original Man with the Plan) as an advisor to the PM.We’ll create a new National Data Library. We’ll have an AI Energy Council.
Dario Amodei calls this a ‘bold approach that could help unlock AI’s potential to solve real problems.’ Half the post is others offering similar praise.
Here is Matt Clifford’s summary Twitter thread.
AI safety? Never heard of her, although we’ll sprinkle the adjective ‘safe’ on things in various places.
Here Barney Hussey-Yeo gives a standard Rousing Speech for a ‘UK Manhattan Project’ not for AGI, but for ordinary AI competitiveness. I’d do my Manhattan Project on housing if I was the UK, I’d still invest in AI but I’d call it something else.
My instinctive reading here is indeed that 50 items is worse than 5, and this is a kitchen sink style approach of things that mostly won’t accomplish anything.
The parts that likely matter, if I had to guess, are:
The vibes do seem quite good.
Our Price Cheap
A lot of people hate AI because of the environmental implications.
When AI is used at scale, the implications can be meaningful.
However, when the outputs of regular LLMs are read by humans, this does not make any sense. The impact is miniscule.
Note that arguments about impact on AI progress are exactly the same. Your personal use of AI does not have a meaningful impact on AI progress – if you find it useful, you should use it, based on the same logic.
I’m also a fan of this:
All this concern, on a personal level, is off by orders of magnitude, if you take it seriously as a physical concern.
Whereas the associated costs of existing as a human, and doing things including thinking as a human, are relatively high.
One must understand that such concerns are not actually about marginal activities and their marginal cost. They’re not even about average costs. This is similar to many other similar objections, where the symbolic nature of the action gets people upset vastly out of proportion to the magnitude of impact, and sacrifices are demanded that do not make any sense, while other much larger actually meaningful impacts are ignored.
The Quest for Sane Regulations
Senator Weiner is not giving up.
An argument from Anton Leicht that Germany and other ‘middle powers’ of AI need to get AI policy right, even if ‘not every middle power can be the UK,’ which I suppose they cannot given they are within the EU and also Germany can’t reliably even agree to keep open its existing nuclear power plants.
I don’t see a strong case here for Germany’s policies mattering much outside of Germany, or that Germany might aspire to a meaningful role to assist with safety. It’s more that Germany could screw up its opportunity to get the benefits from AI, either by alienating the United States or by putting up barriers, and could do things to subsidize and encourage deployment. To which I’d say, fair enough, as far as that goes.
Dario Amodei and Matt Pottinger write a Wall Street Editorial called ‘Trump Can Keep America’s AI Advantage,’ warning that otherwise China would catch up to us, then calling for tightening of chip export rules, and ‘policies to promote innovation.’
I understand why Dario would take this approach and attitude. I agree on all the concrete substantive suggestions. And Sam Altman’s framing of all this was clearly far more inflammatory. I am still disappointed, as I was hoping against hope that Anthropic and Dario would be better than to play into all this, but yeah, I get it.
Dean Ball believes we are now seeing reasoning translate generally beyond math, and his ideal law is unlikely to be proposed, and thus is willing to consider a broader range of regulatory interventions than before. Kudos to him for changing one’s mind in public, he points to this post to summarize the general direction he’s been going.
Super Duper Export Controls
New export controls are indeed on the way for chips. Or at least the outgoing administration has plans.
America’s close allies get essentially unrestricted access, but we’re stingy with that, a number of NATO countries don’t make the cut. Tier two countries, in yellow above, have various hoops that must be jumped through to get or use chips at scale.
Add in some additional rules where a company can keep how much of its compute, and some complexity about what training runs constitute frontier models that trigger regulatory requirements.
Leave it to the Biden administration to everything bagel in human rights standards, and impose various distributional requirements on individual corporations, and to leave us all very confused about key details that will determine practical impact. As of writing this, I don’t know where this lines either in terms of how expensive and annoying this will be, and also whether it will accomplish much.
To the extent all this makes sense, it should focus on security, and limiting access for our adversaries. No everything bagels. Hopefully the Trump administration can address this if it keeps the rules mostly in place.
There’s a draft that in theory we can look at but look, no, sorry, this is where I leave you, I can’t do it, I will not be reading that. Henry Farrell claims to understand what it actually says. Semi Analysis has a very in depth analysis.
Farrell frames this as a five-fold bet on scaling, short term AGI, the effectiveness of the controls themselves, having sufficient organizational capacity and on the politics of the incoming administration deciding to implement the policy.
I see all five as important. If the policy isn’t implemented, nothing happens, so the proposed bet is on the other four. I see all of them as continuums rather than absolutes.
Yes, the more scaling and AGI we get sooner, the more effective this all will be, but having an advantage in compute will be strategically important in pretty much any scenario, if only for more and better inference on o3-style models.
Enforcement feels like one bet rather than two – you can always break up any plan into its components, but the question is ‘to what extent will we be able to direct where the chips go?’ I don’t know the answer to that.
No matter what, we’ll need adequate funding to enforce all this (see: organizational capacity and effectiveness), which we don’t yet have.
He interestingly does not mention a sixth potential problem, that this could drive some countries or companies into working with China instead of America, or hurt American allies needlessly. These to me are the good argument against this type of regime.
The other argument is the timing and methods. I don’t love doing this less than two weeks before leaving office, especially given some of the details we know and also the details we don’t yet know or understand, after drafting it without consultation.
However the incoming administration will (I assume) be able to decide whether to actually implement these rules or not, as per point five.
In practice, this is Biden proposing something to Trump. Trump can take it or leave it, or modify it. Semi Analysis suggests Trump will likely keep this as America first and ultimately necessary, and I agree. I also agree that it opens the door for ‘AI diplomacy’ as newly Tier 2 countries seek to move to Tier 1 or get other accommodations – Trump loves nothing more than to make this kind of restriction, then undo it via some kind of deal.
Semi Analysis essentially says that the previous chip rules were Swiss cheese that was easily circumvented, whereas this new proposed regime would inflict real costs in order to impose real restrictions, on not only chips but also on who gets to do frontier model training (defined as over 10^26 flops, or fine tuning of more than ~2e^25 which as I understand current practice should basically never happen without 10^26 in pretraining unless someone is engaged in shenanigans) and in exporting the weights of frontier closed models.
Note that if more than 10% of data used for a model is synthetic data, then the compute that generated the synthetic data counts towards the threshold. If there essentially gets to be a ‘standard synthetic data set’ or something that could get weird.
They note that at scale this effectively bans confidential computing. If you are buying enough compute to plausibly train frontier AI models, or even well short of that, we don’t want the ‘you’ to turn out to be China, so not knowing who you are is right out.
Semi Analysis notes that some previously restricted countries like UAE and Saudi Arabia are de facto ‘promoted’ to Tier 2, whereas others like Brazil, Israel, India and Mexico used to be unrestricted but now must join them. There will be issues with what would otherwise be major data centers, they highlight one location in Brazil. I agree with them that in such cases, we should expect deals to be worked out.
They expect the biggest losers will be Malaysia and Singapore, as their ultimate customer was often ByteDance, which also means Oracle might lose big. I would add it seems much less obvious America will want to make a deal, versus a situation like Brazil or India. There will also be practical issues for at least some non-American companies that are trying to scale, but that won’t be eligible to be VEUs.
Although Semi Analysis thinks the impact on Nvidia is overstated here, Nvidia is pissed, and issued a scathing condemnation full of general pro-innovation logic, claiming that the rules even prior to enforcement are ‘already undercutting U.S. interests.’ The response does not actually discuss any of the details or mechanisms, so again it’s impossible to know to what extent Nvidia’s complaints are valid.
I do think Nvidia bears some of the responsibility for this, by playing Exact Words with the chip export controls several times over and turning a fully blind eye to evasion by others. We have gone through multiple cycles of Nvidia being told not to sell advanced AI chips to China. Then they turn around and figure out exactly what they can sell to China while not technically violating the rules. Then America tightens the rules again. If Nvidia had instead tried to uphold the spirit of the rules and was acting like it was on Team America, my guess is we’d be facing down a lot less pressure for rules like these.
Everything Bagel Data Centers
What we definitely did get, as far as I can tell, so far, was this other executive order.
Which has nothing to do with any of that? It’s about trying to somehow build three or more ‘frontier AI model data centers’ on federal land by the end of 2027.
This was a solid summary, or here’s a shorter one that basically nails it.
Here are my notes.
So it’s an everything bagel attempt to will a bunch of ‘frontier model data centers’ into existence on federal land, with a lot of wishful thinking about overcoming various legal and regulatory barriers to doing that. Ho hum.
d/acc Round 2
Vitalik offers reflections on his concept of d/acc, or defensive accelerationism, a year later.
The first section suggests, we should differentially create technological decentralized tools that favor defense. And yes, sure, that seems obviously good, on the margin we should pretty much always do more of that. That doesn’t solve the key issues in AI.
Then he gets into the question of what we should do about AI, in the ‘least convenient world’ where AI risk is high and timelines are potentially within five years. To which I am tempted to say, oh you sweet summer child, that’s the baseline scenario at this point, the least convenient possible worlds are where we are effectively already dead. But the point remains.
He notes that the specific objections to SB 1047 regarding open source were invalid, but objects to the approach on grounds of overfitting to the present situation. To which I would say that when we try to propose interventions that anticipate future developments, or give government the ability to respond dynamically as the situation changes, this runs into the twin objections of ‘this has moving parts, too many words, so complex, anything could happen, it’s a trap, PANIC!’ and ‘you want to empower the government to make decisions, which means I should react as if all those decisions are being made by either ‘Voldemort’ or some hypothetical sect of ‘doomers’ who want nothing but to stop all AI in its tracks by any means necessary and generally kill puppies.’
Thus, the only thing you can do is pass clean simple rules, especially rules requiring transparency, and then hope to respond in different ways later when the situation changes. Then, it seems, the objection comes that this is overfit. Whereas ‘have everyone share info’ seems highly non-overfit. Yes, DeepSeek v3 has implications that are worrisome for the proposed regime, but that’s an argument it doesn’t go far enough – that’s not a reason to throw up hands and do nothing.
Vitalik unfortunately has the confusion that he thinks AI in the hands of militaries is the central source of potential AI doom. Certainly that is one source, but no that is not the central threat model, nor do I expect the military to be (successfully) training its own frontier AI models soon, nor do I think we should just assume they would get to be exempt from the rules (and thus not give anyone any rules).
But he concludes the section by saying he agrees, that doesn’t mean we can do nothing. He suggests two possibilities.
First up is liability. We agree users should have liability in some situations, but it seems obvious this is nothing like a full solution – yes some users will demand safe systems to avoid liability but many won’t or won’t be able to tell until too late, even discounting other issues. When we get to developer liability, we see a very strange perspective (from my eyes):
So we want to ensure we do not have control over AI? Control over AI is a bad thing we want to see less of, so we should tax it? What?
This is saying, you create a dangerous and irresponsible system. If you then irreversibly release it outside of your control, then you’re less liable than if you don’t do that, and keep the thing under control. So, I guess you should have released it?
What? That’s completely backwards and bonkers position for a legal system to have.
Indeed, we have many such backwards incentives already, and they cause big trouble. In particular, de facto we tax legibility in many situations – we punish people for doing things explicitly or admitting them. So we get a lot of situations in which everyone acts illegibly and implicitly, and it’s terrible.
Vitalik seems here to be counting on that open models will be weaker than closed models, meaning basically it’s fine if the open models are offered completely irresponsibly? Um. If this is how even relatively responsible advocates of such openness are acting, I sure as hell hope so, for all our sakes. Yikes.
If the rogue AI takes over your stuff, then it’s your fault? This risks effectively outlawing or severely punishing owning or operating equipment, or equipment hooked up to the internet. Maybe we want to do that, I sure hope not. But if [X] releases a rogue AI (intentionally or unintentionally) and it then takes over [Y]’s computer, and you send the bill to [Y] and not [X], well, can you imagine if we started coming after people whose computers had viruses and were part of bot networks? Whose accounts were hacked? Now the same question, but the world is full of AIs and all of this is way worse.
I mean, yeah, it’s incentive compatible. Maybe you do it anyway, and everyone is forced to buy insurance and that insurance means you have to install various AIs on all your systems to monitor them for takeovers, or something? But my lord.
Overall, yes, liability is helpful, but trying to put it in these various places illustrates even more that it is not a sufficient response on its own. Liability simply doesn’t properly handle catastrophic and existential risks. And if Vitalik really does think a lot of the risk comes from militaries, then this doesn’t help with that at all.
The second option he offers is a global ‘soft pause button on industrial-scale hardware. He says this is what he’d go for if liability wasn’t ‘muscular’ enough, and I am here to tell him that liability isn’t muscular enough, so here we are. Once again, Vitalk’s default ways of thinking and wanting things to be are on high display.
As he next points out, d/acc is an extension of crypto and the crypto philosophy. Vitalik clearly has real excitement for what crypto and blockchains can do, and little of that excitement involves Number Go Up.
His vision? Pretty cool:
Alas, I am much less convinced.
I like d/acc. On almost all margins the ideas seem worth trying, with far more upside than downside. I hope it all works great, as far as it goes.
But ultimately, while such efforts can help us, I think that this level of allergy to and fear of any form of enforced coordination or centralized authority in any form, and the various incentive problems inherent in these solution types, means the approach cannot centrally solve our biggest problems, either now or especially in the future.
Prove me wrong, kids. Prove me wrong.
But also update if I turn out to be right.
I also would push back against this:
I understand why one might see things that way. Certainly there are various examples of backsliding, in various places. Until and unless we reach Glorious AI Future, there always will be. But overall I do not agree. I think this is a misunderstanding of the past, and often also a catastrophization of what is happening now, and also the problem that in general previously cooperative and positive and other particular things decay and other things must arise to take their place.
The Week in Audio
David Dalrymple on Safeguarded, Transformative AI on the FLI Podcast.
Joe Biden’s farewell address explicitly tries to echo Eisenhower’s Military-Industrial Complex warnings, with a warning about a Tech-Industrial Complex. He goes straight to ‘disinformation and misinformation enabling the abuse of power’ and goes on from there to complain about tech not doing enough fact checking, so whoever wrote this speech is not only the hackiest of hacks they also aren’t even talking about AI. They then say AI is the most consequential technology of all time, but it could ‘spawn new threats to our rights, to our way of life, to our privacy, to how we work and how we protect our nation.’ So America must lead in AI, not China.
Sigh. To us. The threat is to us, as in to whether we continue to exist. Yet here we are, again, with both standard left-wing anti-tech bluster combined with anti-China jingoism and ‘by existential you must mean the impact on jobs.’ Luckily, it’s a farewell address.
Mark Zuckerberg went on Joe Rogan. Mostly this was about content moderation and martial arts and a wide range of other things. Sometimes Mark was clearly pushing his book but a lot of it was Mark being Mark, which was fun and interesting. The content moderation stuff is important, but was covered elsewhere.
There was also an AI segment, which was sadly about what you would expect. Joe Rogan is worried about AI ‘using quantum computing and hooked up to nuclear power’ making humans obsolete, but ‘there’s nothing we can do about it.’ Mark gave the usual open source pitch and how AI wouldn’t be God or a threat as long as everyone had their own AI and there’d be plenty of jobs and everyone who wanted could get super creative and it would all be great.
There was a great moment when Rogan brought up the study in which ChatGPT ‘tried to copy itself when it was told it was going to be obsolete’ which was a very fun thing to have make it onto Joe Rogan, and made it more intact than I expected. Mark seemed nonplussed.
It’s clear that Mark Zuckerberg is not taking alignment, safety or what it would mean to have superintelligent AI at all seriously – he thinks there will be these cool AIs that can do things for us, and hasn’t thought it through, despite numerous opportunities to do so, such as his interview with Dwarkesh Patel. Or, if he has done so, he isn’t telling.
Sam Altman goes on Rethinking with Adam Grant. He notes that he has raised his probability of faster AI takeoff substantially, as in within a single digit number of years. For now I’m assuming such interviews are mostly repetitive and skipping.
Kevin Byran on AI for Economics Education (from a month ago).
Benioff also says ‘AGI is not here’ so that’s where the goalposts are now, I guess. AI is good enough to stop hiring SWEs but not good enough to do every human task.
Rhetorical Innovation
From December, in the context of the AI safety community universally rallying behind the need for as many H1-B visas as possible, regardless of the AI acceleration implications:
Think about two alternative hypotheses:
Ask yourself, what positions, statements and actions do these alternative hypotheses predict from those people in areas other than AI, and also in areas like H1-Bs that directly relate to AI?
I claim that the evidence overwhelmingly supports hypothesis #1. I claim that if you think it supports #2, or even a neutral position in between, then you are not paying attention, using motivated reasoning, or doing something less virtuous than those first two options.
It is continuously frustrating to be told by many that I and many others advocate for exactly the things we spend substantial resources criticizing. That when we support other forms of progress, we must be lying, engaging in some sort of op. I beg everyone to realize this simply is not the case. We mean what we say.
There is a distinct group of people against AI, who are indeed against technological progress and human flourishing, and we hate that group and their ideas and proposals at least as much as you do.
If you are unconvinced, make predictions about what will happen in the future, as new Current Things arrive under the new Trump administration. See what happens.
Eliezer Yudkowsky points out you should be consistent about whether an AI acting as if [X] means it is [X] in a deeper way, or not. He defaults to not.
Leaving this here, from Amanda Askell, the primary person tasked with teaching Anthropic’s models to be good in the virtuous sense.
Aligning a Smarter Than Human Intelligence is Difficult
Might want to get on that. The good news is, we’re asking the right questions.
Presumably if you do this, you want to do this in a fashion that allows you to avoid ‘end the world in a bad wish.’ Yes, we have decades of explanations of why avoiding this is remarkably hard and by default you will fail, but this part does not feel hopeless if you are aware of the dangers and can be deliberate. I do see OpenAI as trying to do this via a rather too literal ‘do exactly what we said’ djinn-style plan that makes it very hard to not die in this spot, but there’s time to fix that.
In terms of loss of control, I strongly disagree with the instinct that a superintelligent AI’s chances of playing nicely are altered substantially based on whether we tried to retain control over the future or just handed it over, as if it will be some sort of selfish petulant child in a Greek myth out for revenge and take that out on humanity and the entire lightcone – but if we’d treated it nice it would give us a cookie.
I’m not saying one can rule that out entirely, but no. That’s not how preferences happen here. I’d like to give an ASI at least as much logical, moral and emotional credit as I would give myself in this situation?
And if you already agree that the djinn-style plan of ‘it does exactly what we ask’ probably kills us, then you can presumably see how ‘it does exactly something else we didn’t ask’ kills us rather more reliably than that regardless of what other outcomes we attempted to create.
I also think (but don’t know for sure) that Stephen is doing the virtuous act here of biting a bullet even though it has overreaching implications he doesn’t actually intend. As in, when he says ‘enslaved God’ I (hope) he means this in the positive sense of it doing the things we want and arranging the atoms of the universe in large part according to our preferences, however that comes to be.
Later follow-ups that are even better: It’s funny because it’s true.
Stephen is definitely on my ‘we should talk’ list. Probably on Monday?
John Wentworth points out that there are quite a lot of failure modes and ways that highly capable AI or superintelligence could result in extinction, whereas most research narrowly focuses on particular failure modes with narrow stories of what goes wrong – I’d also point out that such tales usually assert that ‘something goes wrong’ must be part of the story, and often in this particular way, or else things will turn out fine.
Buck pushes back directly, saying they really do think the the primary threat is scheming in the first AIs that pose substantial misalignment risk. I agree with John that (while such scheming is a threat) the overall claim seems quite wrong, and I found this pushback to be quite strong.
I also strongly agree with John on this:
If you frame it as ‘the model is scheming’ and treat that as a failure mode where something went wrong to cause it that is distinct from normal activity, then it makes sense to be optimistic about ‘detecting’ or ‘preventing’ such ‘scheming.’ And if you then think that this is a victory condition – if the AI isn’t scheming then you win – you can be pretty optimistic. But I don’t think that is how any of this works, because the ‘scheming’ is not some distinct magisteria or failure mode and isn’t avoidable, and even if it were you would still have many trickier problems to solve.
Individually, that is true. But that’s only if you respond by thinking you can take each one individually and find a hacky solution to it, rather than them being many manifestations of a general problem. If you get into a hacking contest, where people brainstorm stories of things going wrong and you give a hacky solution to each particular story in turn, you are not going to win.
Periodically, someone suggests something along the lines of ‘alignment is wrong, that’s enslavement, you should instead raise the AI right and teach it to love.’
There are obvious problems with that approach.
Other People Are Not As Worried About AI Killing Everyone
Once again: You got to give ‘em hope.
A lot of the reason so many people are so gung ho on AGI and ASI is that they see no alternative path to a prosperous future. So many otherwise see climate change, population decline and a growing civilizational paralysis leading inevitably to collapse.
Roon is the latest to use this reasoning, pointing to the (very real!) demographic crisis.
The overall population isn’t projected to decline for a while yet, largely because of increased life expectancy and the shape of existing demographic curves. Many places are already seeing declines and have baked in demographic collapse, and the few places making up for it are mostly seeing rapid declines themselves. And the other problems look pretty bad, too.
That’s why we can’t purely focus on AI. We need to show people that they have something worth fighting for, and worth living for, without AI. Then they will have Something to Protect, and fight for it and good outcomes.
The world of 2025 is, in many important ways, badly misaligned with human values. This is evidenced by measured wealth rising rapidly, but people having far fewer children, well below replacement, and reporting that life and being able to raise a family and be happy are harder rather than easier. This makes people lose hope, and should also be a warning about our ability to design aligned systems and worlds.
The Lighter Side
Why didn’t I think of that (some models did, others didn’t)?
Well, that doesn’t sound awesome.
This, on the other hand, kind of does.