I believe that opensource advancements like R1 will drive wider adoption of ai systems.
I think that the pricing models will change soon. Everyone talks about cost per million tokens to contact a hosted service, but I think it'll switch to be cloud costs to provide infrastructure that can run models. Virtual machines running something like ollama.
This solves another huge problem, privacy and how prompt data is handled. If you're using an api to a hosted service you need to have a very good understanding of how your submitted prompt data is handled. This is key for organisations. I feel like the lack of understanding here is preventing widespread adoption, especially for communication tools that handle sensitive data.
For example, you could run Deepseek R1 using ollama on an Azure virtual machine (nc series) that you pay per hour for, and then your cost isn't based on usage of your ai. Right now it's expensive to provision the infra to support decent models, but these costs fall continuously.
I can imagine a world where organisations provision cloud infrastructure in their environments running open source models.
https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/nc-series?tabs=sizebasic
https://huggingface.co/deepseek-ai/DeepSeek-R1
Is there any other consumer software that works on this model? I can't think of any
Some enterprise software has stuff like this
A very detailed and technical analysis of the bear case for Nvidia by Jeffrey Emanuel, that Matt Levine claims may have been responsible for the Nvidia price decline.
I read that last week. It was an interesting case of experiencing Gell-Mann-Amnesia several times within the same article.
All the parts where I have some expertise were vague, used terminology incorrectly and were often just wrong. All the rest was very interesting!
If this article crashed the market: EMH RIP.
I would hesitate to buy a build based on R1. R1 is special in the sense that the MoE-architecture trades off compute requirements vs RAM requirements. Which is why now these CPU-builds start to make some sense - you get a lot less compute, but much more RAM.
As soon as the next dense model drops which could have 5-times fewer parameters for the same performance the build will stop making any sense. And of course until then you are also handicapped when it comes to running smaller models fast.
The sweet spot is integrated RAM/VRAM like in a Mac and in the upcoming NVIDIA DIGITS. But buying a handful of used 3090s probably also makes more sense to me then the CPU-only builds.
New benchmark agrees with my intuitions on r1's creative writing skills: https://x.com/LechMazur/status/1885430117591027712
As reactions continue, the word in Washington, and out of OpenAI, is distillation. They’re accusing DeepSeek of distilling o1, of ripping off OpenAI. They claim DeepSeek *gasp* violated the OpenAI Terms of Service! The horror.
And they are very cross about this horrible violation, and if proven they plan to ‘aggressively treat it as theft,’ while the administration warns that we must put a stop to this.
Aside from the fact that this is obviously very funny, and that there is nothing they could do about it in any case, is it true?
Meanwhile Anthropic’s Dario Amodei offers a reaction essay, which also includes a lot of good technical discussion of why v3 and r1 aren’t actually all that unexpected along the cost and capability curves over time, calling for America to race towards AGI to gain decisive strategic advantage over China via recursive self-improvement, although he uses slightly different words.
Table of Contents
Seeking Deeply
If you want to use DeepSeek’s r1 for free, and aren’t happy with using DeepSeek’s own offerings, lambda.chat reports they have the full version available for free, claim your data is safe and they’re hosted in the USA.
I’ve also been offered funding to build a rig myself. Comments welcome if you want to help figure out the best design and what to buy. The low bid is still this thread at $6k, which is where the original budget came from. We don’t want to be too stingy, but we also don’t want to go nuts with only the one funder (so not too much over ~$10k, and cheaper matters).
The Market Is In DeepSeek
The Verge’s Kylie Robinson and Elizabeth Lopatto cover the situation, including repeating many of the classic Bad DeepSeek Takes and call the market’s previous valuation of AI companies delusional.
A very detailed and technical analysis of the bear case for Nvidia by Jeffrey Emanuel, that Matt Levine claims may have been responsible for the Nvidia price decline. I suppose many things do indeed come to pass, essentially arguing that Nvidia’s various moats are weak. If this is the reason, then that just raises further questions, but they’re very different ones.
It’s not implausible to me that Nvidia’s moats are being overestimated, and that r1’s architecture suggests future stiffer competition. That’s a good argument, But I certainly strongly disagree with Emanuel’s conclusion in that he says ‘this suggests the entire industry has been massively over-provisioning compute resources,’ and, well, sigh.
Also, seriously, Emanuel, you didn’t short Nvidia? I don’t normally go too hard on ‘are you short the market?’ but in this case get it together, man.
So yes, Nvidia in particular might have some technical issues. But if you’re shorting Oklo, because you think AI companies that find out AI works better than expected are not going to want modular nuclear reactors, seriously, get it together. The flip side of that is that its stock price is up 50% in the last month and is at 6 times its 52-week low anyway, so who is to say there is a link or that the price isn’t high enough anyway. It’s not my department and I am way too busy to do the research.
Counterpoint:
r1 scores 15.8% on Arc, below o1 (low)’s score of 20.5%, although substantially cheaper ($0.06 vs. $0.43 per question). It is only a tiny bit stronger here than r1-zero.
Another restatement of the key basic fact that DeepSeek was fast following, a task that is fundamentally vastly easier, and that their limiting factor is chips.
Cate Metz continues be the worst, together with Mike Isaac he reports in NYT that DeepSeek ‘vindicates Meta’s strategy.’
When of course it is the exact opposite. DeepSeek just ate Meta’s lunch, it’s rather deeply embarrassing honestly to have spent that much and have an unreleased model that’s strictly worse (according to reports) than what DeepSeek shipped. And while DeepSeek’s v3 and r1 are not based on Llama, to the extent that the strategy is ‘vindicated,’ it is because Meta giving Llama away allowed China and DeepSeek to jumpstart and catch up to America – which absolutely did happen, and now he’s kind of bragging about it – and now Meta can copy DeepSeek’s tech.
All according to plan, then. And that is indeed how Zuckerberg is spinning it.
Meta benefits here relative to OpenAI or Anthropic or Google, not because both Meta and DeepSeek use open models, but because Meta can far more readily use the help.
The market, of course, sees ‘lower inference costs’ and cheers, exactly because they never gave a damn about Meta’s ability to create good AI models, only Meta’s ability to sell ads and drive engagement. Besides, they were just going to give the thing away anyway, so who cares?
Joe Weisenthal centers in on a key reason the market acts so bonkers. It doesn’t Feel the AGI, and is obsessed with trying to fit AI into boring existing business models. They don’t actually believe in the big capability advancements on the way, let along transformational AI. Like on existential risk (where they don’t not believe in it, they simply don’t think about it at all), they’re wrong. However, unlike existential risk this does cause them to make large pricing mistakes and is highly exploitable by those with Situational Awareness.
Machines Not of Loving Grace
Anthropic CEO Dario Amodei responds to DeepSeek with not only a call for stronger export controls, now more than ever (which I do support), but for a full jingoistic ‘democracies must have the best models to seek decisive strategic advantage via recursive self-improvement’ race.
I am old enough to remember when Anthropic said they did not want to accelerate AI capabilities. I am two years old. To be fair, in AI years, that’s an eternity.
There’s also a bunch of incidental new information about Anthropic along the way, and he notes that he finds the drop in Nvidia stock to be a wrong-way move.
Dario notes that Jevons paradox applies to model training. If you get algorithmic efficiencies that move the cost curve down, which he estimates are now happening at the rate of about 4x improvement per year, you’ll spend more, and if the model is ‘a fixed amount of improvement per time you spend ten times as much’ then this makes sense.
Dario confirms that yes, Anthropic is doing reasoning models internally.
Dario also asserted that Claude Sonnet 3.5 was not trained in any way that involved a larger or more expensive model, as in not with Claude Opus 3 or an unreleased Opus 3.5. Which I find surprising as a strategy, but I don’t think he’d lie about this. He says the cost of Sonnet 3.5 was ‘a few $10Ms’ to train.
Anthropic has not released their reasoning models. One possibility is that their reasoning models are not good enough to release. Another is that they are too good to release. Or Anthropic’s limited compute could be more valuably used elsewhere, if they too are bottlenecked on compute and can’t efficiently turn dollars into flops and then sell those flops for sufficiently more dollars.
Dario (I think mostly correctly) notes that v3 was the bigger technical innovation, rather than r1, that Anthropic noticed then and others should have as well. He praises several innovations, the MoE implementation and Key-Value cache management in particular.
Then comes the shade, concluding this about v3:
Ethan Mollick finds that analysis compelling. I am largely inclined to agree. v3 and r1 are impressive, DeepSeek cooked and are cracked and all that, but that doesn’t mean the American labs aren’t in the lead, or couldn’t do something similar or better on the inference cost curve if they wanted.
In general, the people saying r1 and Stargate are ‘straight lines on graphs win again’ notice that the straight lines on those graphs predict AGI soon. You can judge for yourself how much of that is those people saying ‘unsurprising’ post-hoc versus them actually being unsurprised, but it does seem like the people expecting spending and capabilities to peter out Real Soon Now keep being the ones who are surprised.
Then he moves on to r1.
Again, Dario is saying they very obviously have what we can (if only for copyright reasons, a1 is a steak sauce) call ‘c1’ and if he’s calling r1 uninteresting then the implicit claim is c1 is at least as good.
He’s also all but saying that soon, at minimum, Anthropic will be releasing a model that is much improved on the performance curve relative to Sonnet 3.6.
One odd error is Dario says DeepSeek is first to offer visible CoT. This is not true, Gemini Flash Thinking did it weeks ago. It’s so weird how much Google has utterly failed to spread the word about this product.
Next he says, yes, of course the top American labs will be massively scaling up their new multi-billion-dollar training runs – and they’ll incorporate any of DeepSeek’s improvements that were new to them, to get better performance, but no one will be spending less compute.
Yes, billions are orders of magnitude more than the millions DeepSeek spent, but also, in all seriousness, who cares about the money? DeepSeek dramatically underspent because of lack of chip access, and if a sort-of-if-you-squint-at-it $5.6 million model (that you spent hundreds of millions of dollars getting the ability to train, and then a few million more to turn v3 into r1) wipes out $500 billion or more in market value, presumably it was worth spending $56 million (or $560 million or perhaps $5.6 billion) instead to get a better model even if you otherwise use exactly the same techniques – except for the part where the story of the $5.6 million helped hurt the market.
Dario estimates that a true AGI will cost tens of billions to train and will happen in 2026-2027, presumably that cost would then fall over time.
If all of this is right, the question is then, who has the chips to do that? And do you want to let it include Chinese companies like DeepSeek?
Notice that Dario talks of a ‘bipolar’ world of America and China, rather than a world of multiple labs – of OpenAI, Anthropic, Google and DeepSeek and so on. One can easily also imagine a very ‘multipolar’ world among several American companies, or a mix of American and Chinese companies. It is not so obvious that the labs will effectively be under government control or otherwise act in a unified fashion. Or that the government won’t effectively be under lab control, for that matter.
Then we get to the part where Dario explicitly calls for America to race forward in search of decisive strategic advantage via recursive self-improvement of frontier AGI models, essentially saying that if we don’t do it, China essentially wins the future.
It is what it is.
Dario then correctly points out that DeepSeek is evidence the export controls are working, not evidence they are not working. He explicitly calls for also banning H20s, a move Trump is reported to be considering.
I support the export controls as well. It would be a major mistake to not enforce them.
But this rhetoric, coming out of the ‘you were supposed to be the chosen one’ lab that was founded to keep us safe, is rather alarming and deeply disappointing, to say the least, even though it does not go that much farther than Dario already went in his previous public writings.
I very much appreciate Anthropic’s culture of safety among its engineers, its funding of important safety work, the way it has approached Opus and Sonnet, and even the way it has (presumably) decided not to release its reasoning model and otherwise passed up some (not all!) of its opportunities to push the frontier.
That doesn’t excuse this kind of jingoism, or explicitly calling for this kind of charging head first into not only AGI but also RSI, in all but name (and arguably in name as well, it’s close).
The Kinda Six Million Dollar Model
Returning to this one more time since it seems rhetorically so important to so many.
If you only count the final training cost in terms of the market price of compute, v3 was kind of trained for $5.6 million, with some additional amount to get to r1.
That excludes the vast majority of actual costs, and in DeepSeek’s case building the physical cluster was integral to their efficiency gains, pushing up the effective price even of the direct run.
But also, how does that actually compare to other models?
v3 Implies r1
Once o1 came out, it was only a matter of time before others created their own similar reasoning models. r1 did so impressively, both in terms of calendar time and its training and inference costs. But we already knew the principle.
Now over at UC Berkeley, Sky-T1-32B-Preview is a reasoning model trained using DeepSeek’s techniques, two weeks later from a baseline of QwQ-32B-Preview, for a grand total of $450, using only 17k data, with everything involved including the technique fully open sourced.
Note that they used GPT-4o-mini to rewrite the QwQ traces, which given their purpose is an explicit violation of OpenAI’s terms of service, oh no, but very clearly isn’t meaningful cheating, indeed I’d have thought they’d have used an open model here or maybe Gemini Flash.
They report that 32B was the smallest model where the technique worked well.
As usual, I am skeptical that the benchmarks reflect real world usefulness until proven otherwise, but the point is taken. The step of turning a model into at least a halfway-decent reasoning model is dirt cheap.
There is still room to scale that. Even if you can get a big improvement for $450 versus spending $0, that doesn’t mean you don’t want to spend $4.5 million, or $450 million, if the quality of your reasoner matters a lot or you’re going to use it a lot or both.
Two Can Play That Game
And should!
That sounds great. Humans are notoriously efficient learners, able to train on extremely sparse data even with ill-specified rewards. With deliberate practice and good training techniques it is even better.
It does not even require that r1 be all that good at reasoning. All you have to do is observe many examples of reasoning, on tasks you care about anyway, and ask which of its methods work and don’t work and why, and generally look for ways to improve. If you’re not doing at least some of this while using r1, you’re missing out and need to pay closer attention.
Janus Explores r1’s Chain of Thought Shenanigans
What is happening over in cognitive explorations very different from our own?
Well, there’s this.
And there’s also that the CoT text is often kind of schemy and paranoid (example at link), leading to various forms of rather absurd shenanigans, in ways that are actually hilarious since you can actually see it.
I notice that none of this feels at all surprising given the premise, where ‘the premise’ is ‘we trained on feedback to the output outside of the CoT, trained the CoT only on certain forms of coherence, and then showed users the CoT.’
As I’ve been saying a lot, shenanigans, scheming and deception are not a distinct magisteria. They are ubiquitous features of minds. Maybe not all minds – mindspace is deep and wide – but definitely all human minds, and all LLM-based AIs created from human text using any of our current methods. Because that stuff is all over life and the training data, and also it’s the best way to produce outputs that satisfy any given criteria, except insofar as you are successfully identifying and cracking down on that aspect specifically – which with respect to other humans is indeed a very large percentage of what humans have historically done all day.
The best you can hope for is, essentially, ‘doing it for a good cause’ and with various virtual (and essentially virtue-based) loss functions, which you might or might not get in a proper Opus-based c1 with good execution. But you’re not going to get rid of it.
So yeah, the CoT is going to be schemy when the question calls for a schemy CoT, and it’s going to involve self-reflection into various reinforcement mechanisms because the training data knows about those too, and it will definitely be like that once you take it into Janus-land.
The obvious implications if you scale that up are left as an exercise to the reader.
In Other DeepSeek and China News
Bank of China announces $137 billion investment in AI, with bigger numbers predicted to come soon if they haven’t yet. Strange that this isn’t getting more coverage. I assumed China would invest big in AI because I mean come on, but the details still matter a lot.
DeepSeek’s Liang Wenfeng gives his answer to ‘Why has DeepSeek caused a stir in the global AI community?’ A different kind of rhetoric.
In terms of actually distributing the intelligence to most people, I agree with Roon. Being open distributes the intelligence to those who would use it in ways you don’t want them to use it. But in the ways you would be happy for them to use it, mostly what matters is the interface and execution.
And yes, r1’s UI is extremely clean and excellent, and was distributed at scale on website and also mobile app for free. That’s a lot of why distribution was so wide.
I also don’t think this was a coincidence. DeepSeek made by far the best open model. Then DeepSeek offered us by far the best open model UI and distribution setup, in ways that did not care if the model was open. You see this time and again – if the team is cracked, they will cook, and keep on cooking in different ways. Being good at Just Doing Things really does generalize quite a lot.
r1 only scores 90 on the TrackingAI.org IQ test, which doesn’t exist online, and v3 only gets a 70. But wow is this a miserly and weird test, look at these results, I strongly suspect this is messed up in some way.
I have to be suspicious when o1-Pro < o1 < o1-preview on a benchmark.
Alexander Campbell on the compute constraint to actually run r1 and other reasoning models going forwards.
The Quest for Sane Regulations
Trump administration considering export controls on Nvidia H20s, which reportedly caused the latest 5% decline in Nvidia from Wednesday. This is the latest move in the dance where Nvidia tries to violate the spirit of our export controls the maximum extent they can. I’m not sure I’d try that with Trump. This does strongly suggests the diffusion regulations will survive, so I will give the market a real decline here.
Who has the most stringent regulations, and therefore is most likely to lose to China, via the ‘if we have any regulations we lose to China’ narrative?
I do think Ian Hogarth is wrong here. The EU absolutely has a wide variety of laws and regulations that greatly inhibit technology startups in general, and I see no reason to expect this to not get worse over time. Then there’s the EU AI Act, and all the future likely related actions. If I was in the EU and wanted to start an AI company, what is the first thing I would do? Leave the EU. Sorry.
Copyright Confrontation
10/10, perfect, no notes. My heart goes out to you all.
You’d think Khosla would know better, if you train similar models with similar methods of course they’re going to often make similar mistakes.
And I don’t consider the ‘they were distilling us!’ accusation to be meaningful here. We know how they trained v3 and r1, because they told us. It is a ‘fast follow’ and a conceptual ‘distillation’ and we should keep that in mind, but that’s not something you can prevent. It’s going to happen. This was almost certainly not a ‘theft’ in the sense that is being implied here.
Did they violate the terms of service? I mean, okay, sure, probably. You sure you want to go down that particular road, OpenAI?
But no, seriously, this is happening, Bloomberg reports.
AI czar David Sacks is also claiming this, saying there is ‘substantial evidence’ of distillation. Howard Lutnick, CEO of Cantor Fitzgerald and nominee for Commerce Secretary that will almost certainly be confirmed, is buying it as well, and has some thoughts.
Also, this is how he thinks all of this works, I guess:
…says someone with extensive ties to Tether. Just saying.
Also Lutnick: ‘Less regulation will unleash America.’
In general, I agree with him, if we do get less regulation. But also notice that suddenly we have to stop the Chinese from ‘breaking in’ and ‘taking our IP,’ and ‘it has to stop.’
Well, how do you intend to stop it? What about people who want to give ours away?
Well, what do you know.
Jingoism is so hot right now. It’s a problem. No, every dollar that flows into China will not ‘be used against the United States’ and seriously what the actual f*** are you doing, once again, trying to ban both imports and exports? How are both of these things a problem?
In any case, I know what Microsoft is going to do about all this.
Microsoft knows what Hawley doesn’t, which in this case is to never interrupt the enemy while he is making a mistake. If DeepSeek wants to then give their results back to us for free, and it’s a good model, who are we to say no?
What other implications are there here?
Robin Hanson, never stop Robin Hansoning, AI skepticism subversion.
Want some bad news for future AI capabilities? I’ve got just the thing for you.
The WSJ article seems to buy into r1-as-distillation. Certainly r1 is a ‘fast follow’ and copies the example of o1, but v3 was the impressive result and definitely not distillation at all, and to primarily call r1 a distillation seems very wrong. r1 does allow you distill r1 into other smaller things (see ‘v3 implies r1’) or bootstrap into larger things too, and also they told everyone how to do it, but they chose that path.
Also DeepSeek suddenly has a very valuable market position if they were to dare to try and use it, exactly because they spent a lot of money to get there first. The fact that others can copy r1 only partly takes that away, and it would be a much smaller part if they hadn’t gone as open as they did (although being open in this case helped create the opportunity). Similarly, Berkeley’s replication distilled a different open model.
ChatGPT has retained dominant market share, at least until now, for reasons that have little to do with technical superiority.
Vibe Gap
It is crazy how easy it is for people to go all Missile Gap, and claim we are ‘losing to China.’
Which, I suppose, means that in a key way we are indeed losing to China. We are letting them drive this narrative that they are winning, that the future belongs to them. Which, when so many people now believe in Rule By Vibes, means they have the vibes, and then here we are.
That phenomenon is of course centered this week on AI, but it goes well beyond AI.
Et tu, Tyler Cowen, citing ‘the popularity of apps like TikTok, RedNote and DeepSeek.’
I mean, ‘how did America’s internet become so cool? The popularity of apps like Google, Amazon, Instagram and Netflix’ is not a sentence anyone would ever utter these days. If China had America’s apps and America had China’s apps, can you imagine? Or the same for any number of other things.
RedNote is effectively also TikTok, so Tyler is citing two examples. Yes, TikTok cracked the addiction algorithm, and China is now using that for propaganda and general sabotage, espionage and shenanigans purposes, and managed to ‘convince’ Trump for now not to ban it, and people were so desperate for their heroin fix some turned to RedNote as ‘refugees.’
Tyler notes he doesn’t use TikTok much. I find it completely worthless and unusable, but even in so doing I do think I kind of understand, somewhat, the kind of addictive haze that it invokes, that pull of spinning the roulette wheel one more time. I’ve watched people briefly use it when we’re both on trains, and yeah I’m Being That Guy but wow did it seem braindead, worthless and toxic AF. Even if they did find videos worth watching for you, given how people scroll, how would you even know?
And how about ‘China seems cool’ being due primarily to… vibes out of TikTok, with the algorithm that is in large part designed to do that?
It’s like when you periodically see a TikTok where some American youth sobs about how hard her life is and how it’s so much better in China, in various ways that are… documented as all being far worse in China.
You are being played.
My main exposure to TikTok is through the comedy show After Midnight. On Tuesday evening, they had an intro that was entirely about DeepSeek, painting exactly (mostly through TikTok) effectively a Chinese propaganda story about how DeepSeek manifested r1 out of thin air for $6 million without any other work, whereas OpenAI and American companies spent billions, and how much better DeepSeek is, and so on. And then host Taylor Tomlinson responded to some of the audience with ‘oh, you’re cheering now? Interesting.’
Part of the joke was that Taylor has no idea how AI works and has never used even ChatGPT, and the routine was funny (including, effectively, a joke about how no one cares if Nvidia stock is down 17%, which is completely fair, why should they, also by the taping it was only down 8%), but the streams crossed, I saw America directly being exposed to even worse takes than I’m used to straight from TikTok’s algorithm when I was supposed to be relaxing at the end of the day, and I really didn’t like it.
Then again, I do bow to one clear way in which China did outperform us.
If they release anything called r1.5, I swear to God.
Deeply Seeking Safety
Sarah (Yuan Yuan Sun Sara from China) suggests perhaps DeepSeek could get into doing AI safety research, maybe even ask for a grant? Certainly there’s great talent there, and I’d love if they focused on those styles of problem. There’d likely be severe corporate culture issues to get through given what they’ve previously worked on, but it’s worth a shot.
I increasingly worry about the pattern of OpenAI safety researchers thinking about how to ‘control’ superintelligence rather than align it, and how this relates to the techniques they’re currently using including deliberative alignment.
(Note: I still owe that post on Deliberative Alignment, coming soon.)
Deeply Seeking Robotics
Are reasoning models including r1 a blackpill for robotics progress?
Yes and no, right?
It’s going to be relatively hard, but seems super doable to me, I know those in the field will say that’s naive but I don’t see it. The real physical world absolutely 100% has ground truth in it. If you want to train on an accurate reward signal, there’s various trickiness, but there are plenty of things we should be able to measure. Also, with time we should get increasingly strong physics simulations that provide increasingly strong synthetic data for robotics, or simply have so much funding that we can generate physical samples anyway? We’re sample-inefficient relative to a human but you can train a decent reasoning model on 17k data points, and presumably you could bootstrap from there, and so on.
Thank You For Your Candor
I am not going to quote or name particular people directly on this at this time.
But as Obama often said, let me be clear.
Reasonable people can disagree about:
However.
The existence of DeepSeek, and its explicit advocacy of open weights AGI, and potentially having it be the best model out there in the future in many people’s imginations, has been a forcing function. Suddenly, people who previously stuck to ‘well obviously your restrictions are too much’ without clarifying where their line was, are revealing that they have no line.
And many more people than before are revealing that they prefer any or all of:
These people are often saying, rather explicitly, that they will use whatever powers they have at their disposal, to ensure that humanity gets to a position that, if you think about it for a minute or five, humanity probably cannot survive.
And that they will oppose, on principle, any ability to steer the future, because they explicitly oppose the ability to steer the future, except when they want to steer the future into a state that cannot then be steered by humans.
No, I have not heard actual arguments for why or how you can put an aligned-only-to-user AGI into everyone’s desktop or whatever, with no mechanism of collective control over that whatsoever, and have this end well for the humans. What that future would even look like.
Nor have I heard any argument for why the national security states of the world, or the people of the world, would ever allow this.
The mask on those is fully off. These people don’t bother offering arguments on any of that. They just say say, essentially, ‘f*** you safetyists,’ ‘f*** you big tech,’ ‘f*** you United States,’ and often effectively ‘f*** you rest of humanity.’ They are the xenocide caucus, advocating for things that cause human extinction to own the in-context-libs.
If that is you: I thank you for your candor. Please speak directly into this microphone.
I disagree in the strongest possible terms.
Thank You For Your Understanding
As always, be excellent to each other, and all that.
A large part of this job I’ve assigned to myself is to do a f***ton of emotional labor.
You have people who are constantly telling you that you’re a cartoon villain because you think that the United States government might want to know if someone trains a frontier model, or that you might think releasing a literal AGI’s weights would be unwise, or that we shouldn’t let China get our best GPUs. You get called statist and totalitarian for positions that are 95th to 99th percentile libertarian. You get outright lies, all the time, from all directions. Much from people trying to incept the vibes they want. And so on.
And the same stuff to varying degrees coming from other directions, too.
Honestly I’m kind of used to it. Up to a point. You get somewhat numb, you build up some immunity, especially when the same sources do it over and over. I accept it.
And even with that, you have to patiently read all of it and respond to the arguments and also try to extract what wisdom might be there from the same sources that are filled with the toxoplasma of rage and trying their best to infect me and others like me as well.
But it’s been a trying time. I see a world determined to try and go down many of the craziest, most suicidal paths simultaneously, where I’m surrounded by equal and opposite bad takes in many dimensions. Where the odds are against us and the situation is grim. In ways that I and others warned about explicitly, including the exact ways and dynamics by which we reached this point.
Make no mistake. Humanity is losing.
Meanwhile, on top of all the Being Wrong on the Internet, the toxoplasma is as bad as it has ever been, with certain sources going so far as to in large part blame not only worried people in general but also me specifically by name for our current situation – and at least one of those people I feel compelled to continue to listen to because they also have unique insights in other ways and I’m sometimes told I have a blind spot there – which I actually rarely hear about other credible sources.
And I still try. But I’m only human and it’s just so damn hard at this point. Especially when they rage about things I said that turned out to be true, and true for exactly the reasons I said they’d be true, but I know trying to point this out wouldn’t do any good.
I don’t know what my solution here is going to be. I do know that things can’t go on like this, I know life isn’t fair and reality doesn’t grade on a curve and someone has to and no one else will but also I only have so much in the tank that handles these things. And I’m going to have to budget that tank, but I want to be clear that I’m going to be doing that, and dropping certainly sources for this reason that I would otherwise have included for completeness.
If this was talking about you, and you’d like to continue this trip, please get it together.
The Lighter Side
Don’t worry, your argument remains valid. I mean, it’s wrong, but that never stopped you before, why start now?
Time comes for us all.