First, I didn't say it wasn't communicating anything. But since you bring it up: it communicated exactly what jefftk said in the post already describing the scene. And what it did communicate that he didn't say cannot be trusted at all. As jefftk notes, 4o in doing style transfer makes many large, heavily biased, changes to the scene, going beyond even just mere artifacts like fingers. If you don't believe that people in that room had 3 arms or that the room looked totally different (I will safely assume that the room was not, in fact, lit up in tastefully cat-urine yellow in the 4o house style), why believe anything else it conveys? If it doesn't matter what those small details were, then why 'communicate' a fake version of them all? And if it does matter what those small details were, surely it's bad to communicate a fake, wrong version? (It is odd to take this blase attitude of 'it is important to communicate, and what is communicated is of no importance'.)
Second, this doesn't rebut my point at all. Whatever true or false things it does or does not communicate, the image is ugly and unaesthetic: the longer you look at it, the worse it gets, as the more bland, stereotypical, and strewn with errors and laziness you understand it to be. It is AI slop. (I would personally be ashamed to post an image even to IRC, never mind my posts, which embodies such low standards and disrespects my viewers that much, and says, "I value your time and attention so little that I will not lift a finger to do a decent job when I add a big attention-grabbing image that you will spend time looking at.") Even 5 seconds to try to inpaint the most blatant artifacts, or to tell ChatGPT, "please try again, but without the yellow palette that you overuse in every image"*, would have made it better.
* incidentally, I've been asking people here if they notice how every ChatGPT 4o-generated image is by default yellow. Invariably, they have not. One or two of them have contacted me later to express the sentiment that 'what has been seen cannot be unseen'. This is a major obstacle to image editing in 4o, because every time you inpaint, the image will mutate a decent bit, and will tend to turn a bit more yellow. (If you iterate to a fixed point, a 4o image turns into all yellow with sickly blobs, often faces, in the top left. It is certainly an odd generative model.)
Seems similar to the "anti-examples" prompting trick I've been trying: taking the edits elicited from a chatbot, and reversing them to serve as few-shot anti-examples of what not to do. (This would tend to pick up X-isms.)
One obvious reason to get upset is how low the standards of people posting them are. Let's take jefftk's post. It takes less than 5 seconds to spot how lazy, sloppy, and bad the hands and arms are, and how the picture is incoherent and uninformative. (Look at the fiddler's arms, or the woman going under 2 arms that make zero sense, or the weird doors, or the table which seems to be somehow floating, or the dubious overall composition - where are the yellow fairy and non-fairy going, exactly?, or the fact that the image is the stereotypical cat-urine yellow of all 4o images.) Why should you not feel disrespected and insulted that he was so careless and lazy to put in such a lousy, generic image?
I think that's exactly how it goes, yeah. Just free association: what token arbitrarily comes to mind? Like if you stare at some static noise, you will see some sort of lumpiness or pattern, which won't be the same as what someone else sees. There's no explaining that at the conscious level. It's closer to a hash function than any kind of 'thinking'. You don't ask what SHA is 'thinking' when you put in some text and it spits out some random numbers & letters. (You would see the same thing if you did a MLP or CNN on MNIST, say. The randomly initialized NN does not produce a uniform output across all digits, for all inputs, and that is the entire point of randomly initializing. As the AI koan goes...)
It is not clear how the models are able to self-coordinate. It seems likely that they are simply giving what they believe would be the most common answer the same way a group of humans might. However, it is possible the models are engaging in more sophisticated introspection focussing on how they specifically would answer. Follow-up investigations could capture models’ chain of thought as well as tweak the prompt to indicate that the model should strive to be consistent with an answer a human might give or another company’s AI model might give. Circuit-tracing[6] might be a useful tool for future research into what is actually happening when a model self-coordinates
One possibility not mentioned here is that they are exploiting essentially arbitrary details of their initialization. (I'm not sure what to call this sort of a priori, acausal coordination.) Any NN is going to have undertrained associations, which are due largely to their random initialization, because it is difficult to be exactly uncorrelated and 0.000... etc. when you are a big complicated neural network which is being forced to generate big complicated high-dimensional outputs. This would be similar to glitch tokens. In this case, mechanistic interpretability will struggle to find anything meaningful (because that doesn't really exist, it's diffuse trends in all the weights adding up nonlinearly to a slight numerical imbalance etc) and the inner-monologues are probably going to be highly misleading or total confabulations (because there is no explanation and so no inner-monologue can be faithful).
(This is not quite what you usually think of with steganography or non-robust features, but of course, if you can start with a set of arbitrary associations of everything with everything, then it is a great way to create both of those and get emergent steganography. Because the more LLMs engage in self-coordination, the more they create a genuine signal in future training data to bootstrap the initial random associations into a true set of regularities which can be exploited as non-robust features and then turn into an explicit steganographic code.)
EDIT: the apparent arbitrariness and uninterpretability of the approximations subsequently reported in https://www.lesswrong.com/posts/qHudHZNLCiFrygRiy/emergent-misalignment-on-a-budget seem consistent with the predictions of the acausal coordination interpretation, rather than the Waluigi or truesight interpretation (and maybe the steganographic interpretation too).
Yes, a NN can definitely do something like know if it recognizes a datapoint, but it has no access to the backwards step per se. Like take my crashing example: how, while thinking in the forward pass, can it 'know' there will be a backward pass when there might be no backward pass (eg because there was a hardware fault)? The forward pass would appear to be identical in every way between the forward pass that happens when there is a backward pass, and when the backward pass doesn't happen because it crashed. At best, it seems like a NN cannot do more than some sort of probabilistic thing involving gradient hacking, and hope to compute in such a way that if there is a following backward pass, then that will do something odd.
I don't think this is impossible in principle, based on meta-learning examples or higher-order gradients (see eg my "input-free NN" esoteric NN architecture proposal), but it's clearly a very difficult, fragile, strange situation where it's certainly not obvious that a regular LLM would be able to do it, or choose to do so when there are so many other kinds of leakage or situated awareness or steganography possible.
You can look this up on knowyourmeme and confirm it, and I've done an interview on the topic as well. Now I don't know much about "improving public discourse" but I have a long string of related celebrity hoaxes and other such nonsense which often crosses over into a "War of the Worlds" effect in which it is taken quite seriously...I have had some people tell me that I'm doing what you're calling "degrading the public discourse," but that couldn't be farther from the truth. It's literature of a very particular kind, in fact. Are these stories misinterpreted willfully, just for the chance to send friends a shocking or humorous link? Sure. You can caption the bell curve and label the far ends with "this story is completely true" and the midwits with "I'm so mad you're degrading public discourse." But only the intelligent people are really finding it humorous. And I am sure that what has actually happened is that the American sense of humor has become horribly degraded, which I think is the truly morbid symptom more than anything else, as humor is a very critical component to discernment...But even more than those really truly sad examples, there's a sadder humorlessness in America where people are apparently no longer surprised or amused by anything.
This seems like a good explanation of how you have degraded the public discourse.
There are some use-cases where quick and precise inference is vital: for example, many agentic tasks (like playing most MOBAs or solving a physical Rubik's cube; debatably most non-trivial physical tasks) require quick, effective, and multi-step reasoning.
Yeah, diffusion LLMs could be important not for being better at predicting what action to take, but for hitting real-time latency constraints, because they intrinsically amortize their computation more cleanly over steps. This is part of why people were exploring diffusion models in RL: a regular bidirectional or unidirectional LLM tends to be all-or-nothing, in terms of the forward pass, so even if you are doing the usual optimization tricks, it's heavyweight. A diffusion model lets you stop in the middle of the diffusing, or use that diffusion step to improve other parts, or pivot to a new output entirely.
A diffusion LLM in theory can do something like plan a sequence of future actions+states in addition to the token about to be executed, and so each token can be the result of a bunch of diffusion steps from a long time ago. This allows a small fast model to make good use of 'easy' timesteps to refine its next action: it just spends the compute to keep refining its model of the future and what it ought to do next, so at the next timestep, the action is 'already predicted' (if things were going according to plan). If something goes wrong, then the existing sequence may still be an efficient starting point compared to a blank slate, and quickly update to compensate. And this is quite natural compared to trying to bolt on something to do with MoEs or speculative decoding or something.
So your robot diffusion LLM can be diffusing a big context of thousands of tokens, which represents its plan and predicted environment observations over the next couple seconds, and each timestep, it does a little more thinking to tweak each token a little bit, and despite this being only a few milliseconds of thinking each time by a small model, it eventually turns into a highly capable robot model's output and each action-token is ready by the time it's necessary (and even if it's not fully done, at least it is there to be executed - a low-quality action choice is often better than blowing the deadline and doing some default action like a no-op). You could do the same thing with a big classic GPT-style LLM, but the equivalent quality forward pass might take 100ms and now it's not fast enough for good robotics (without spending a lot of time on expensive hardware or optimizing).
This post is an example of my method. Over the last 1-2 years, I’ve made heavy use of AIs, lately DeepSeek and Claude. I do the same with them: present my ideas, deal with their criticisms and objections—whether to correct them or take correction myself—until we’re agreed or the AI starts looping or hallucinating. So, when I say I have yet to hear, after all this time, credible, convincing arguments to the contrary, it’s after having spent the time and done the work that most people don’t even attempt.
Or, to put it less flatteringly, "I harangue the most sycophantic and new-agey LLMs I can find until they finally agree with me, in the absence of any objective feedback or empirical evidence, about something I'm already certain of, and I think this is intellectually valid work which deserves the name of 'findings' and is an 'investigation' far superior to whatever it is 'most people' do, rather than deserving the name 'intellectual masturbation'."
I have yet to hear, after all this time, credible, convincing arguments to the contrary.
You don't say.
One idea might be to pair debates with Delphi panels: do the usual Delphi method to get a consensus report beforehand, and then have them explain & debate what is left over as non-consensus (or possibly, if there are some experts who disagree hotly with the consensus report, bring them on for a debate with the original panel).