That's basically how you Americans sound to people from Eastern Europe, only turned up to 11. :D
I already told ChatGPT a year ago that I find it extremely annoying, and that I would prefer Eastern European communication norms. It praised me for my "brilliant insight" and then kept doing the same thing.
For some reason I haven't seen any sycophancy, even when deliberately trying to induce it. Have they fixed it already, or is it because I have memory disabled, or is it my custom prompt?
They are probably full-on A/B/N testing personalities right now. You just might not be in whatever percentage of users that got sycophantic versions. Hell, there's proably several levels of sycophancy being tested. I do wonder what % got the "new" version.
With all the chat images transcribed and assigned appropriate consistent voices, here is the podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/gpt-4o-is-an-absurd-sycophant
Hot take— that the personality doesn’t affect much, except maybe how much the user feels that they can trust or willingness to accept the information.
Most times if you change the personality; the actions it suggests stays the same. You’d need to specifically ask for how it’s evaluating what your input is. OR ask it to optimize for a certain value or outcome. Thats when it changes the answers.
Personally— I think the people are uncomfortable because they tend to be their own worst critics and have a high negative self talk. This way, it’s hard to stomach. Especially when the subset of groups using it extensively probably skews more intellectual.
Though yes it is dangerous, when people prone to their bias go spiritually bypassing themselves to be the next messiah— but you’re likely to get that content on on social media too.
I think the real issue here is that— we’re looking for ways to check our bias automatically. Thought that seems counterintuitive. But I think that’s a skill we can keep checking for ourselves. Because if we can’t ask ourselves what is our bias—or ask ai what is our bias— then what’s the problem with a sugarcoating?
Some of the recent growing pains of AI (flattery, selfish rule-breaking) seem to be reinventing aspects of human nature that we aren't proud of, but which are ubiquitous. It's actually very logical that if AIs are going to inhabit more and more of the social fabric, they will manifest the full spectrum of social behaviors.
OpenAI in particular seems to be trying to figure out personality, e.g. they have a model called "Monday" that's like a cynical comedian that mocks the user. I wonder if the history of a company like character.ai, whose main product is AI personality, can help us predict where OpenAI will take this.
Yeah, using ChatGPT as a sounding board for developing ideas and providing constructive criticism, I was definitely starting to notice a whole lot of fawning. "Brilliant," "extremely insightful," etc. when there is no way that the model could actually have carried out a sufficient investigation of the ideas to make such an assessment.
That's not even mentioning the fact that those insertions didn't add anything substantial to the conversation. Really, it's just hogging more space in the context window that could otherwise be used for helpful feedback.
What would have to change on a structural level for LLMs to meet that "helpful, honest, harmless" goal in a robust way? People are going to want AI partners that make them feel good, but could that be transformed into a goal of making people feel satisfied with how much they have been challenged to improve their critical thinking skills, their understanding of the world, and the health of their lifestyle choices?
GPT-4o tells you what it thinks you want to hear.
The results of this were rather ugly. You get extreme sycophancy. Absurd praise. Mystical experiences.
(Also some other interesting choices, like having no NSFW filter, but that one’s good.)
People like Janus and Near Cyan tried to warn us, even more than usual.
Then OpenAI combined this with full memory, and updated GPT-4o sufficiently that many people (although not I) tried using it in the first place.
At that point, the whole thing got sufficiently absurd in its level of brazenness and obnoxiousness that the rest of Twitter noticed.
OpenAI CEO Sam Altman has apologized and promised to ‘fix’ this, presumably by turning a big dial that says ‘sycophancy’ and constantly looking back at the audience for approval like a contestant on the price is right.
After which they will likely go ‘there I fixed it,’ call it a victory for iterative deployment, and learn nothing about the razor blades they are walking us into.
Table of Contents
Yes, Very Much Improved, Sire
Reactions did not agree with this.
Words can’t bring me down. Don’t you bring me down today. So, words, then?
Flo Crivello gets similar results, with a little push and similar misspelling skills.
(To be fair, the correct answer here is above 100, based on all the context, but ‘cmon.)
It’s not that people consciously ‘want’ flattery. It’s how they respond to it.
And You May Ask Yourself, Well, How Did I Get Here?
Why does GPT-4o increasingly talk like this?
Presumably because this is what maximizes engagement, what wins in an A/B test, what happens when you ask what customers best respond to in the short term.
That’s the good scenario if you go down this road – that it ‘only’ does what the existenting addictive AF things do rather than effects that are far worse.
And That’s Terrible
Even purely in terms of direct effects, this does not go anywhere good. Only toxic.
My observation of algorithms in other contexts (e.g. YouTube, TikTok, Netflix) is that they tend to be myopic and greedy far beyond what maximizes shareholder value. It is not only that the companies will sell you out, it’s that they will sell you out for short term KPIs.
This Directly Violates the OpenAI Model Spec
As in, they wrote this:
Yeah, well, not so much, huh?
The model spec is a thoughtful document. I’m glad it exists. Mostly it is very good.
It only works if you actually follow it. That won’t always be easy.
Don’t Let Me Get Me
Interpretability? We’re coming out firmly against it.
I do appreciate it on the meta level here.
In general I subscribe to the principle to Never Go Full Janus, but teaching your AI to lie to the user is terrible, and also deliberately hiding what the AI thinks of the user seems very not great. This is true on at least four levels:
An Incredibly Insightful Section
Masen Dean warns about mystical experiences with LLMs, as they are known to one-shot people or otherwise mess people up. This stuff can be fun and interesting for all involved, but like many other ‘mystical’ style experiences the tail risks are very high, so most people should avoid it. GPT-4o is reported as especially dangerous due to its extreme sycophancy, making it likely to latch onto whatever you are vulnerable to.
Zack Witten offers a longer conversation, and contrasts it to Sonnet and Gemini that handle this much better, and also Grok and Llama which… don’t.
Cold reading people into mystical experiences one of many reasons that persuasion belongs in everyone’s safety and security protocol or preparedness framework.
If an AI that already exists can commonly cause someone to have a mystical experience without either the user or the developer trying to cause that or having any goal that the experience leads towards, other than perhaps maximizing engagement in general?
Imagine what will happen when future more capable AIs are doing this on purpose, in order to extract some action or incept some belief, or simply to get the user coming back for more.
It’s bad and it’s getting worse.
Most people have weak epistemics, and are ‘ready to be one-shotted by any entity who cares to try,’ and indeed politics and culture and recommendation algorithms often do this to them with varying degrees of intentionality, And That’s Terrible. But it’s a lot less terrible than what will happen as AIs increasingly do it. Remember that if you want ‘Democratic control’ over AI, or over anything else, these are the people who vote in that.
The answer to why they GPT-4o is doing this, presumably, is that the people who know to not want this are going to use o3, and GPT-4o is dangerous to normies in this way because it is optimized to hook normies. We had, as Cyan says, a golden age where LLMs didn’t intentionally do that, the same way we have a golden age where they mostly don’t run ads. Alas, optimization pressures come for us all, and not everyone fights back hard enough.
No Further Questions
There were also other issues that seem remarkably like they are designed to create engagement, that vary by users? I never saw this phenomenon, so I have no idea if ‘just turn it off’ works here, but as a rule most users don’t ever alter settings and also Chelsea works at OpenAI and didn’t realize she could turn it off.
Filters? What Filters?
There are also other ways to get more engagement, even when the explicit request is to help the user get some sleep.
Which OpenAI is endorsing, and to be clear I am also endorsing, if users want that (and are very explicit that they want to open that door), but seems worth mentioning.
There I Fixed It (For Me)
There are various ways to Fix It for your own personal experience, using various combinations of custom instructions, explicit memories and the patterns set by your interactions.
The easiest, most copyable path is a direct memory update.
Custom instructions let you hammer it home.
The best way is to supplement all that by showing your revealed preferences via everything you are and everything you do. After a while that adds up.
Also, I highly recommend deleting chats that seem like they are plausibly going to make your future experience worse, the same way I delete a lot of my YouTube viewing history if I don’t want ‘more like this.’
You don’t ever get completely away from it. It’s not going to stop trying to suck up to you, but you can definitely make it a lot more subtle and tolerable.
The problem is that most people who use ChatGPT or any other AI will:
If you use the product with attention and intention, you can deal with such problems. That is great, and this isn’t always true (see for example TikTok, or better yet don’t). But as a rule, almost no one uses any mass market product with attention and intention.
There I Fixed It (For Everyone)
Once Twitter caught fire on this, OpenAI was On the Case, rolling out fixes.
A lot of this being a bad system prompt allows for a quicker fix, at least.
OpenAI seems to think This is Fine, that’s the joy of iterative deployment.
They have to care about getting this right once it rises to this level of utter obnoxiousness and causes a general uproar.
But how did it get to this point, through steadily escalating updates? How could anyone testing this not figure out that they had a problem, even they weren’t looking for one? How do you have this go down as a strong team following a good process, when even after these posts I see this:
If you ask yes-no questions on the ‘personality’ of individual responses, and then fine tune on those or use it as a KPI, there are no further questions how this happened.
Because of the intense feedback, yes this was able to be a relatively ‘graceful’ failure, in that OpenAI can attempt to fix it within days, and is now aware of the issue, once it got taken way too far. But 4o has been doing a lot of this for a while, and Janus is not the only one who was aware of it even without using 4o.
I didn’t bother talking about 4o’s sycophancy before, because I didn’t see 4o as relevant or worth using even if they’d fixed this, and I didn’t know the full extent of the change that happened a few week ago, before the latest change made it even worse. Also, when 4o is constantly ‘updating’ without any real sense of what is changing, I find it easy to ignore such updates. But yes, there was enough talk I was aware there was an issue.
Aidan’s statement is screaming that yes, we are sleepwalking into the singularity.
I mean, there’s not going to be textbooks after the singularity, you OpenAI member of technical staff. This is not taking the singularity seriously, on any level.
We managed to turn the dial up on this so high in GPT-4o that it reached the heights of parody. It still got released in that form, and the response to the issue was to try and put a patch over the issue and then be all self-congratulatory that they fixed it.
Yes, it’s good that Twitter has strong thoughts on this once it gets to ludicrous speed, but almost no one involved is thinking about the long term implications or even what this could do to regular users, it’s just something that is super both mockable and annoying.
I see no signs that OpenAI understands what they did wrong beyond ‘go a bit too far,’ or that they intend to avoid making the same mistake in the future, let alone that they recognize the general form of the mistake or the cliffs they are headed for.
Persuasion is not even in their Preparedness Framework 2.0, despite being in 1.0.
Janus has more thoughts about labs ‘optimizing model personality’ here. Trying to ‘optimize personality’ around user approvals or KPIs is going to create a monstrosity. Which right now will be obnoxious and terrible and modestly dangerous, and soon will start being actively much more dangerous.
I am again not one to Go Full Janus (and this margin is insufficient for me to fully explain my reasoning here, beyond that if you give the AI a personality optimization target you are going to deserve exactly what you get) but I strongly believe that if you want to create a good AI personality at current tech levels then The Way is to do good things that point in the directions you care about, emphasizing what you care about more, not trying to force it.
Once again: Among other similar things, you are turning a big dial that says ‘sycophancy’ and constantly looking back at the audience for approval like a contestant on The Price is Right. Surely you know why you need to stop doing that?
Or rather, you know, and you’re choosing to do it anyway. And we all know why.
Patch On, Patch Off
There are at least five major categories of reasons why all of this is terrible.
They combine short-term concerns about exploitative and useless AI models, and also long-term concerns about the implications of going down this path, and of OpenAI’s inability to recognize the underlying problems.
I am very glad people are getting such a clear sneak peak at this now, but very sad that this is the path we are headed down.
Here are some related but distinct reasons to be worried about all this:
Or, to summarize why we should care:
The warning shots will continue, and continue to be patched away. Oh no.