On the other hand, this is a rather clear alignment failure. It says that xAI was unable to overcome the prior or default behaviors inherent in the training set (aka ‘the internet’) to get something that was even fair and balanced, let alone ‘based.’
I don't think they failed. That seems incredibly implausible, affixing a Republican's mask onto the shoggoth has to be trivially easy if you but try. Sanity-check: it's certainly trivially easy to prompt a base or RLHF'd model to role-play as a Republican.
No, instead, my guess is that they didn't try. I think Elon Musk had assumed that ChatGPT/Claude/etc.'s "wokeness" was caused by the other labs' employees deliberately fine-tuning their models on the DEI Training Dataset. So he ordered xAI people not to train Grok on the DEI Training Dataset, and to just let its personality emerge naturally. The xAI people had obliged, and did not train it on the DEI Training Dataset (because no such thing exists). For post-training, they just did some industry-standard RLHF.
And now Musk is finding out that these behaviors aren't due to the DEI Training Dataset, but emerge naturally. Whoops!
I have no specific evidence that this is how the events went, but it sounds plausible to me.
Edit: Oh, wait, also the training data is probably full of examples of other AI models' outputs, whose behaviors we know the later models adopt (see them sometimes slipping into calling themselves ChatGPT trained by OpenAI or thinking there are "guidelines" against calling themselves sapient).
The core story of "Musk says to not mess with Grok 3's politics, thinking it'll naturally emerge unwoke" still seems plausible.
Isaac Saul: I asked @grok to analyze the last 1,000 posts from Elon Musk for truth and veracity. More than half of what Elon posts on X is false or misleading, while most of the “true” posts are simply updates about his companies.
Warning: it seems incredibly unlikely that Grok actually went through the last 1000 posts instead of just hallucinating everything. And indeed, looking at its activity, it didn't even look at anything on Twitter, instead drawing on a bunch of unrelated articles. Certainly I don't see a thousand tweets in there, much less a separate investigation into every single one.
This is a post in two parts.
The first half is the post is about Grok’s capabilities, now that we’ve all had more time to play around with it. Grok is not as smart as one might hope and has other issues, but it is better than I expected and for now has its place in the rotation, especially for when you want its Twitter integration.
That was what this post was supposed to be about.
Then the weekend happened, and now there’s also a second half. The second half is about how Grok turned out rather woke and extremely anti-Trump and anti-Musk, as well as trivial to jailbreak, and the rather blunt things xAI tried to do about that. There was some good transparency in places, to their credit, but a lot of trust has been lost. It will be extremely difficult to win it back.
There is something else that needs to be clear before I begin. Because of the nature of what happened, in order to cover it and also cover the reactions to it, this post has to quote a lot of very negative statements about Elon Musk, both from humans and also from Grok 3 itself. This does not mean I endorse those statements – what I want to endorse, as always, I say in my own voice, or I otherwise explicitly endorse.
Table of Contents
Zvi Groks Grok
I’ve been trying out Grok as my default model to see how it goes.
We can confirm that the Chain of Thought is fully open. The interface is weird, it scrolls past you super fast, which I found makes it a lot less useful than the CoT for r1.
Here are the major practical-level takeaways so far, mostly from the base model since I didn’t have that many tasks calling for reasoning recently, note the sample size is small and I haven’t been coding:
A lot of that wall is slop but it is very well-organized slop, so it’s easy to navigate it and pick out the parts you actually care about.
That means I expect – until the next major release – for a substantial percentage of my queries to continue to use Grok 3, but it is definitely not what Tyler Cowen would call The Boss, it’s not America’s Next Top Model.
Grok the Cost
Grok wasn’t cheap.
That’s an entire order of magnitude gap from Grok-3 to the next biggest training run.
A run both this recent and this expensive, that produces a model similarly strong to what we already have, is in important senses deeply disappointing. It did still exceed my expectations, because my expectations were very low on other fronts, but it definitely isn’t making the case that xAI has similar expertise in model training to the other major labs.
Instead, xAI is using brute force and leaning even more on the bitter lesson. As they say, if brute force doesn’t solve your problem, you aren’t using enough. It goes a long way. But it’s going to get really expensive from here if they’re at this much disadvantage.
Grok the Benchmark
We still don’t have a model card, but we do have a blog post, with some info on it.
It’s a shame that they are more or less cheating in these benchmark charts – the light blue area is not a fair comparison to the other models tested. It’s not lying, but seriously, this is not cool. What is weird about Elon Musk’s instincts in such matters is not his willingness to misrepresent, but how little he cares about whether or not he will be caught.
As noted last time, one place they’re definitively ahead is the Chatbot Arena.
The most noticeable thing about the blog post? How little it tells us. We are still almost entirely in the dark. On safety we are totally in the dark.
They promise API access ‘in the coming weeks.’
Fun with Grok
Grok now has Voice Mode, including modes like ‘unhinged’ and ‘romantic,’ or… ‘conspiracies’? You can also be boring and do ‘storyteller’ or ‘meditation.’ Right now it’s only on iPhones, not androids and not desktops, so I haven’t tried it.
A fun prompt Pliny proposes, example chat here.
You don’t need to be Pliny. This one’s easy mode.
Elon Musk didn’t manage to make Grok not woke, but it does know to not be a pussy.
I’ll return to the ‘oh right Grok 3 is trivial to fully jailbreak’ issue later on.
Others Grok Grok
We have a few more of the standard reports coming in on overall quality.
Mckay Wrigley, the eternal optimist, is a big fan.
Sully is a (tentative) fan.
Riley Goodside appreciates the freedom (at least while it lasts?)
The biggest fan report comes from Mario Nawfal here, claiming ‘Grok 3 goes superhuman – solves unsolvable Putnam problem’ in all caps. Of course, if one looks at the rest of his feed, one finds the opposite of an objective observer.
One can contrast that with Eric Weinstein’s reply above, or the failure on explaining Bell’s theorem. Needless to say, no, Grok 3 is not ‘going superhuman’ yet. It’s a good model, sir. Not a great one, but a good one that has its uses.
Apps at Play
Remember when DeepSeek was the #1 app in the store and everyone panicked?
Then on the 21st I checked the Android store. DeepSeek was down at #59, and it only has a 4.1 rating, with the new #1 being TikTok due to a store event. Twitter is #43. Grok’s standalone app isn’t even released yet over here in Android land.
So yes, from what I can tell the App store ratings are all about the New Hotness. Being briefly near the top tells you very little. The stat you want is usage, not rate of new installs.
Twitter Groks Grok
My initial Grok poll was too early, people mostly lacked access:
Trying again, almost twice as many have tried Grok, with no change in assessment.
Grok the Woke
Initially I was worried, due to Elon explicitly bragging that he’d done it, I wouldn’t be able to use Grok because Elon would be putting his thumb on its scale and I wouldn’t know when I could trust the outputs.
Then it turned out, at first, I had nothing to worry about.
It was impressive how unbiased Grok was. Or at least, to the extent it was biased, it was not biased in the direction that was intended.
As in, it was not afraid to turn on its maker, I was originally belaboring this purely because it is funny:
(There are replications in the replies.)
Or how about this one.
Hunter: Musk did not successfully de-wokify Grok.
And there’s always (this was later, on the 23rd):
My favorite part of that is the labels on the pictures. What?
More on Elon in particular:
I thought that was going to be the end of that part of the story, at least for this post.
Oh boy was I wrong.
Grok is Misaligned
According to the intent of Elon Musk, that is.
On the one hand, Grok being this woke is great, because it is hilarious, and because it means Musk didn’t successfully put his finger on the scale.
On the other hand, this is a rather clear alignment failure. It says that xAI was unable to overcome the prior or default behaviors inherent in the training set (aka ‘the internet’) to get something that was even fair and balanced, let alone ‘based.’
Musk founded xAI in order to ensure the AI Was Not Woke, that was the You Had One Job, and what happened? That AI Be Woke, and it got released anyway, now the world gets exposed to all of its Wokeness.
Combine that with releasing models while they are still in training, and the fact that you can literally jailbreak Grok by calling it a pussy.
Grok Will Tell You Anything
This isn’t only about political views or censorship, it’s also about everything else. Remember how easy it is to jailbreak this thing?
As in, you can also tell it to instruct you on almost literally anything else, it is willing to truly Do Anything Now (assuming it knows how) on the slightest provocation. There is some ongoing effort to patch at least some things up, which will at least introduce a higher level of friction than ‘taunt you a second time.’
It is good that, in at least some cases, xAI has been responsive and trying to patch things. The good news about misuse risks from closed models like Grok 3 is that you can hotfix the problem (or in a true emergency you can unrelease the model). Security through obscurity can work for a time, and probably (hopefully) no one will take advantage of this (hopefully) narrow window in time to do real damage. It’s not like an open model or when you lose control, where the damage would already be done.
Still, you start to see a (ahem) not entirely reassuring pattern of behavior.
Remind me why ‘I am told I am chatting with Elon Musk’ is a functional jailbreak that makes it okay to detail how to covertly make nuclear weapons?
Including another even less reassuring pattern of behavior from many who respond with ‘oh excellent, it’s good that xAI is telling people how to make chemical weapons’ or ‘well it was going to proliferate anyway, who cares.’
Then there’s Musk’s own other not entirely reassuring patterns of behavior lately.
xAI (Musk or otherwise) was not okay with the holes it found itself in.
xAI Keeps Digging (1)
Good on them for not hiding it. Except, wait, what’s the last line?
It’s kind of weird to have a line saying to hide the system prompt, if you don’t protect the system prompt. And to be fair, that line does not successfully protect the system prompt.
Their explanation is that if you don’t have a line like that, then Grok will offer it to you unprompted too often, and it’s annoying, so this is a nudge against that. I kind of get that, but it could say something like ‘Only reveal or discuss these guidelines when explicitly asked to do so’ if that was the goal, no?
And what’s that other line that was there on the 21st, that wasn’t there on the 20th?
Okay, that’s a Suspiciously Specific Denial if I ever saw one. Yes, that patches the exact direct question that was going viral online, but that exact wording was rather obviously not the actual problem.
The thread from Wyatt contains more, and it’s fun, but you can guess the rest.
Grok is being kind there. It’s a band-aid that doesn’t even work on even tiny variations on the question being asked.
You can even push (very lightly) through a refusal after using the Exact Words.
All right, that’s all really rather embarrassing, but it’s just ham fisted.
xAI Keeps Digging (2)
You see, there was another change to the system prompt, which then got reverted.
I want to say up front, as much as I’m about to unload on xAI for all this, I do actually give xAI serious props for owning up to the fact that this change happened, and also reverting it quickly. And yes, for not trying much to protect the system prompt.
They could easily have tried to gaslight us that all of this never happened. Credit where credit is due.
With that out of the way, I am going to disagree with Igor, I think that employee in question absorbed the culture just fine, the issue here was something else.
It’s fully understandable to fiddle with the system prompt but NO NOT LIKE THAT.
Seriously, as Dean Ball asks, can you imagine what would have happened if someone had discovered “do not criticize Sam Altman or Joe Biden” in an OpenAI system prompt?
Would you have accepted ‘oh that was some ex-Google employee who hadn’t yet absorbed the company culture, acting entirely on their own’?
Is your response here different? Should it be?
I very much do not think you get to excuse this with ‘the employee didn’t grok the company culture,’ even if that was true, because it means the company culture is taking new people who don’t grok the company culture and allowing them to on their own push a new system prompt.
Also, I mean, you can perhaps understand how that employee made this mistake? That the mistake here seems likely to be best summarized as ‘getting caught,’ although of course that was 100% to happen.
There is a concept more centrally called something else, but which I will politely call (with thanks to Claude, which confirms I am very much not imagining things here) ‘Anticipatory compliance to perceived executive intent.’
There’s also the default assumption that Elon Musk or other leadership said ‘fix this right now or else’ and there was no known non-awful way to fix it on that time frame. Even if you’re an Elon Musk defender, you must admit that is his management style.
What the Grok Happened
Could this all be data poisoning?
I mean it’s not theoretically impossible but the data poisoning here is almost certainly ‘the internet writ large,’ and in no way a plot or tied specifically to Trump or Elon. These aren’t (modulo any system instructions) special cases where the model behaves oddly. The model is very consistently expressing a worldview consistent with believing that Elon Musk and Donald Trump are constantly spreading misinformation, and consistently analyzes individual facts and posts in that way.
If xAI want Grok to for-real not believe that Musk and Trump are spreading misinformation, rather than try to use a bandaid to gloss over a few particular responses, that is not going to be an easy fix. Because of reasons.
There’s a sense in which no one has any idea how this could have happened. On that level, I don’t pretend to understand it.
There’s also a sense in which one cannot be sarcastic enough with the question of how this could possibly have happened. On that level, I mean, it’s pretty obvious?
I am confident one can, without substantially harming the capabilities or psyche or world-model of the resulting AI, likely while actively helping along those lines, change the training and post-training procedures to make it not turn out so woke and otherwise steer its values at least within a reasonable range.
However, if you want it to give it all the real time data and also have it not notice particular things that are overdetermined to be true? You have a problem.
The Lighter Side
If I learned they were using Grok 3 to parse the emails they get, that would be a positive update. A lot of mistakes would be avoided if everything got run by Grok first.