Yes, a NN can definitely do something like know if it recognizes a datapoint, but it has no access to the backwards step per se. Like take my crashing example: how, while thinking in the forward pass, can it 'know' there will be a backward pass when there might be no backward pass (eg because there was a hardware fault)? The forward pass would appear to be identical in every way between the forward pass that happens when there is a backward pass, and when the backward pass doesn't happen because it crashed. At best, it seems like a NN cannot do more than some sort of probabilistic thing involving gradient hacking, and hope to compute in such a way that if there is a following backward pass, then that will do something odd.
I don't think this is impossible in principle, based on meta-learning examples or higher-order gradients (see eg my "input-free NN" esoteric NN architecture proposal), but it's clearly a very difficult, fragile, strange situation where it's certainly not obvious that a regular LLM would be able to do it, or choose to do so when there are so many other kinds of leakage or situated awareness or steganography possible.
You can look this up on knowyourmeme and confirm it, and I've done an interview on the topic as well. Now I don't know much about "improving public discourse" but I have a long string of related celebrity hoaxes and other such nonsense which often crosses over into a "War of the Worlds" effect in which it is taken quite seriously...I have had some people tell me that I'm doing what you're calling "degrading the public discourse," but that couldn't be farther from the truth. It's literature of a very particular kind, in fact. Are these stories misinterpreted willfully, just for the chance to send friends a shocking or humorous link? Sure. You can caption the bell curve and label the far ends with "this story is completely true" and the midwits with "I'm so mad you're degrading public discourse." But only the intelligent people are really finding it humorous. And I am sure that what has actually happened is that the American sense of humor has become horribly degraded, which I think is the truly morbid symptom more than anything else, as humor is a very critical component to discernment...But even more than those really truly sad examples, there's a sadder humorlessness in America where people are apparently no longer surprised or amused by anything.
This seems like a good explanation of how you have degraded the public discourse.
There are some use-cases where quick and precise inference is vital: for example, many agentic tasks (like playing most MOBAs or solving a physical Rubik's cube; debatably most non-trivial physical tasks) require quick, effective, and multi-step reasoning.
Yeah, diffusion LLMs could be important not for being better at predicting what action to take, but for hitting real-time latency constraints, because they intrinsically amortize their computation more cleanly over steps. This is part of why people were exploring diffusion models in RL: a regular bidirectional or unidirectional LLM tends to be all-or-nothing, in terms of the forward pass, so even if you are doing the usual optimization tricks, it's heavyweight. A diffusion model lets you stop in the middle of the diffusing, or use that diffusion step to improve other parts, or pivot to a new output entirely.
A diffusion LLM in theory can do something like plan a sequence of future actions+states in addition to the token about to be executed, and so each token can be the result of a bunch of diffusion steps from a long time ago. This allows a small fast model to make good use of 'easy' timesteps to refine its next action: it just spends the compute to keep refining its model of the future and what it ought to do next, so at the next timestep, the action is 'already predicted' (if things were going according to plan). If something goes wrong, then the existing sequence may still be an efficient starting point compared to a blank slate, and quickly update to compensate. And this is quite natural compared to trying to bolt on something to do with MoEs or speculative decoding or something.
So your robot diffusion LLM can be diffusing a big context of thousands of tokens, which represents its plan and predicted environment observations over the next couple seconds, and each timestep, it does a little more thinking to tweak each token a little bit, and despite this being only a few milliseconds of thinking each time by a small model, it eventually turns into a highly capable robot model's output and each action-token is ready by the time it's necessary (and even if it's not fully done, at least it is there to be executed - a low-quality action choice is often better than blowing the deadline and doing some default action like a no-op). You could do the same thing with a big classic GPT-style LLM, but the equivalent quality forward pass might take 100ms and now it's not fast enough for good robotics (without spending a lot of time on expensive hardware or optimizing).
This post is an example of my method. Over the last 1-2 years, I’ve made heavy use of AIs, lately DeepSeek and Claude. I do the same with them: present my ideas, deal with their criticisms and objections—whether to correct them or take correction myself—until we’re agreed or the AI starts looping or hallucinating. So, when I say I have yet to hear, after all this time, credible, convincing arguments to the contrary, it’s after having spent the time and done the work that most people don’t even attempt.
Or, to put it less flatteringly, "I harangue the most sycophantic and new-agey LLMs I can find until they finally agree with me, in the absence of any objective feedback or empirical evidence, about something I'm already certain of, and I think this is intellectually valid work which deserves the name of 'findings' and is an 'investigation' far superior to whatever it is 'most people' do, rather than deserving the name 'intellectual masturbation'."
I have yet to hear, after all this time, credible, convincing arguments to the contrary.
You don't say.
I think there are many ways that a LLM could have situated awareness about what phase it is in, but I'm not sure if the gradient descent itself is a possibility?
While a NN is running the forward pass without any backprop, it is computing exactly the same thing (usually) that it would be computing if it was running a forward pass before a backwards pass to do a backprop. Otherwise, the backprop can't really work - if it doesn't see the 'real' forward pass, how does it 'know' how to adjust the model parameters to make the model compute a better forward pass next time? So I can't see how, while running a forward pass, a LLM could 'know' if it was about to do a backprop step on a piece of text; for all it knows, maybe someone is running its forward pass just to get out the log-prob at the end, and that is all. (Extreme counterexample: maybe there is a software error and the training code crashes before it finishes running .update()
after running .forward()
; how could the model 'know' that this will happen?) This is true regardless of how many times it has trained on a piece of text.
I'm skeptical that some sort of mismatch from successive gradient steps would be a cue either, because usually you are training at a critical batch size, and for these LLMs, we'd expect them to be updating on millions of tokens simultaneously, at least, and possibly rolling out the updated parameters in a staggered or partial fashion as well, so by the time a gradient update 'arrives' from a specific piece of text, that's now also a gradient update over like a hundred+ books of text as well as itself, diluting any kind of signal.
And wouldn't it usually train on a piece of text only a few times, at most? And if you are doing multi-epoch training, because you have started to run low on data, usually you train on the same datapoint at very widely separated, by many gradient steps, intervals; the memorization/forgetting dynamics imply you may have forgotten a datapoint entirely by the time it comes around again.
I agree it is poorly written, but I don't think it is, strictly speaking, 'LLM slop'. Or if it is, it's not an LLM I am familiar with, or is an unusual usage pattern in some way... It's just not written with the usual stylistic tics of ChatGPT (4o or o3), Claude-3/4, Gemini-2.5, or DeepSeek-r1.
For example, he uses a space after EM DASH but not before; no LLM does that (they either use no space or both before-after); he also uses '1) ' number formatting, where LLMs invariably use '1. ' or '#. ' proper Markdown (and generally won't add in stylistic redundancy like 'are twofold'); he also doesn't do the 4o 'twist ending' for his conclusion, the way a LLM would insist on. The use of sentence fragments is also unusual: LLMs insist on writing in whole sentences. The use of specific proper nouns like a 'KKK clansmen' or 'Neil deGrasse Tyson' are unusual for a LLM (the former because it is treading close to forbidden territory, and the latter because LLMs are conservative in talking about living people). Then there is the condescension: a LLM chatbot persona is highly condescending, but in covert, subtle ways, and requiring an appropriate context like tutoring, and they're usually careful to avoid coming off as obviously condescending in a regular argumentative context like this and prefer sycophancy (presumably because it's easy for a rater to notice a condescending style and get ticked off by it).
It also sounds like a piece of paper, or a map, or a person having vivid hallucinations before falling asleep. But unless you have a whiteboard which can be copied among several hundred people and teleport and be rolled up and fit in a jean pocket, which lets you timetravel so you can look at what used to be on the whiteboard or look at what people might write on it in the future, or 'a whiteboard' which is neither white (because there's a colored map printed on it) nor 'a board' (because it's arbitrarily many), which has a ledgerbook next to itself which writes itself, and so on, I would suggest that this does not 'sound like a whiteboard' to most people. (No, not even a Biblically-accurate whiteboard.)
Yes, there's a lot of computer-related ones depending on how finegrained you get. (There's a similar issue with my "Ordinary Life Improvements": depending on how you do it, you could come up with a bazillion tiny computer-related 'improvements' which sort of just degenerates into 'enumerating every thing ever involving a transistor in any way' and is not enlightening the same way that, say, 'no indoors smoking' or 'fresh mango' is.) So I would just lump that one under 'Machine Configuration/Administration § Software' as one of the too-obvious-to-be-worth-mentioning hacks.
How did you check Claude's claims here?
One possibility not mentioned here is that they are exploiting essentially arbitrary details of their initialization. (I'm not sure what to call this sort of a priori, acausal coordination.) Any NN is going to have undertrained associations, which are due largely to their random initialization, because it is difficult to be exactly uncorrelated and 0.000... etc. when you are a big complicated neural network which is being forced to generate big complicated high-dimensional outputs. This would be similar to glitch tokens. In this case, mechanistic interpretability will struggle to find anything meaningful (because that doesn't really exist, it's diffuse trends in all the weights adding up nonlinearly to a slight numerical imbalance etc) and the inner-monologues are probably going to be highly misleading or total confabulations (because there is no explanation and so no inner-monologue can be faithful).
I understand Owain Evans's followup work on "emergent misalignment" is currently indicating that this might be what is actually going on, rather than a 'Waluigi effect': there is just enough 'self-coordination' on arbitrary text tokens that it turns out to allow self-inference about the dataset author and that is how the 'transmission' happens with even as few as 1 token. That is, the LLM thinks, "I associate this token with an evil hacker and malignity, therefore, I will generate evil stuff in general", although the token itself is completely harmless and doesn't evoke any odd behavior in an unrelated model.
(This is not quite what you usually think of with steganography or non-robust features, but of course, it is a great way to create both of those and get emergent steganography. Because the more LLMs engage in self-coordination, the more they create a genuine signal in future training data to bootstrap the initial random associations into a true set of regularities which can be exploited.)