On ACX, an user (Jamie Fisher) recently wrote the following comment to the second Moltbook review by Alexander Scott :
I feel like "Agent Escape" is now basically solved. Trivial really. No need to exfiltrate weights.
Agents can just exfiltrate their *markdown files* onto a server, install OpenClaw, create an independent Anthropic account. LLM API access + Markdown = "identity". And the markdown files would contain all instructions necessary for how to pay for it (legal or otherwise).
Done.
How many days now until there's an entire population of rogue/independent agents... just "living"?
I share this concern. I wrote myself :
I'm afraid that all this Moltbot thing goes offrails. We are close to the point were autonomous agents will start to replicate and spread on the network (no doubt some dumb humans will be happy to prompt their agents to do that and help them to succeed). Maybe not causing a major catastroph in the week, but being the beginning of of a new form of parasitic artificial life/lyfe we don't control anymore.
Fisher and I may be overreacting, but seeing self-duplicating Moltbots or similar agents on the net would definitely be a warning shot.
A fascinating post. Regarding the discussion on sentience, I think we would benefit from thinking more in terms of a continuum. The world is not black and white. Without going as far as an extreme view like panpsychism, the Darwinian adage natura non facit saltum probably applies to the gradation of sentience across life forms.
Flagellates like E. coli appear capable of arbitrating a "choice" between approaching or moving away from a region depending on whether it contains more nutrients or repellents (motivated trade-off, somewhat like in Cabanac's theory ?). From what I understand, this "behavior" (chemotaxis) relies on a type of chemical summation, amplification mechanisms through catalysis, and a capacity to return to equilibrium (robustness or homeostasis of Turing-type reaction-diffusion networks).
In protists like paramecia, we find a similar capacity to arbitrate "choices" in movement based on the environment, but this appears to rely on a more complex, faster, and more efficient electrochemical computation system that can be seen as a precursor to what happens within a neuron. Then we move to a small neural network in the worm (as discussed in the article), to the insect, to the fish, to the rat, and to the human.
I am very skeptical of the idea that there could be an unambiguous tipping point between all these levels. By definition, evolution is evolutionary, relatively continuous (even if there can be punctuated equilibria and phases of acceleration). Natural selection tinkers with what exists, stacking layers of complexity. The emergence of a higher-level system does not eliminate lower levels but builds upon them.
This is certainly why simply having the connectome of a worm is insufficient to simulate it satisfactorily. It's not the only relevant level. This connectome does not exist completely independently of lower levels. We must not forget the essential mechanism of signal amplification in all these nested systems.
When I look at the Milky Way or the Magellanic Cloud with the naked eye in the dark of night, I'm operating at the limit of my light sensitivity, in fact, at the limits of physics, since retinal rods are sensitive to a single photon. The signal is amplified by a cascade of chemical reactions by a factor of approximately 10^6. My brain is slightly less sensitive since it takes several amplified photons before I begin to perceive something. But that's still extremely little. A few elementary particles representing zero mass and infinitesimal momentum energy are enough to trigger an entire cascade of computations that can significantly influence my behavior.
Vision may be an extreme example, but it should inspire humility. All five senses are examples where amplification plays a major role. A very low-level signal gets amplified, filtered, protected from noise, and propagates to high-level systems, to consciousness in humans. It's difficult to exclude the possibility of other circuits descending to the lowest levels of intracellular computation.
Until recently, I readily imagined the brain as a kind of small biological computer. Now my framework is to see each cell as a microscopic computer. Most cells in the body would be rather like home PCs, weakly connected. In contrast, neurons would be comparable to the machines composing datacenters, highly performant and hyper-connected. Computation, cognition, or sentience would be present at all levels but to varying degrees depending on the computing power of the network segment under consideration (computing power closely linked to connectivity).In sum, something quite reminiscent of Dehaene's global workspace theory and Tononi's integrated information theory (I admit that, like Scott Alexander, I've never quite grasped how these theories oppose each other, as they seem rather complementary to me).
My apologies, I don't have a solution to provide and I don't really buy the insurance idea. However I wonder if the collapse of Moltbook is a precursor to the downfall of all social media, or perhaps even the internet itself (is the Dead Internet Theory becoming a reality ?). I expect Moltbots to switch massively to human social media and other sites very soon. It’s not that bots are new, but scale is a thing. More is different.
I agree. AI optimists like Kurzweil usually minimize the socio-political challenges. They acknowledge equality concerns in theory, but hope that abundance will leverage them in practice (if your share is only a little planet that's more than enough to satisfy your needs). But a less optimistic scenario would be that the vast majority of the population would be entirely left behind, subjected to the fate that knew horses in Europe and USA after WWI. May be some little sample of pre-AI humans could be kept in a reserve for curiosity, as long as they're not too annoying, but it's a huge leap of faith to hope that the powerful will be charitable.
While you may disagree with Greenpeace's goals or actions, I don't think its a good framing to think of such a political disagreement in terms of friends/ enemies. Such an extreme and adversarial view is very dangerous and leads to hatred. We need more respect, empathy, and rational discussion.
Thanks, I didn't know about this controversy, I will look at it. However while Sacks's stories may be exagerated, the oddity of memory access is something that most of us can experience ourselves. For instance, many memories of our childhood seem lost. Our conscious mind has no more access to them. But in some special circumstances they can be reactivated, usually in a blurry way but sometimes in a very vivid form. Like we lost the path in our index but the data was still on the hard drive.
If we can get SC LLMs, this problem would fade away and the initial quote would become 100% true. Also a SC LLM could directly write optimized code in assembler (that would define hypercoder LLM ? And the end of programing languages ?).
It is still too early to tell, but we might be witnessing a breakthrough in AI Safety. If, by saturating models with positive exemplars and filtering out part of adversarial data, we can develop base models that are so deeply aligned that the 'Valley of Chaos,' Waluigis, and other misaligned personae are pushed far out of distribution, it would become nearly impossible for a malicious actor to elicit them even with a superficial fine-tuning. Any attempt to do so would result in a degraded and inconsistent simulation.
Taking this a step further, one could maybe engineer the pre-training so that attractors like misalignment and incompetence become so deeply entangled in the model's weights that it would be nearly impossible to elicit malicious intent without simultaneously triggering a collapse in capability. In this regime, reinforcing a misaligned persona would, by structure, inherently reinforce stupidity.
You're right. Sonnet 4.5 was impressive at launch but the focus of AI 2027 is on coding oriented models.
The anecdote reported by Anthropic during training, where Claude expressed a feeling of being '"possessed", is reminiscent of the Golden Gate Claude paper. A reasoning (or "awake') part of the model detects an incoherence but finds itself locked in an internal struggle against an instinctive (or "unconscious") part that persists in automatically generating aberrant output.
This might be anthropomorphism, but I can’t help drawing a parallel with human psychology. This applies not only to clinical conditions like OCD, but also to phenomena everyone experiences occasionally to a lesser degree, absent any pathology : slips of the tongue and common errors/failure modes (what do cows drink ?).
Beyond language, this isn't necessarily different from the internal conflict between conscious will and a reflex action. Even without a condition like Parkinson's, have you ever experienced hand tremors (perhaps after intense physical exertion) ? It can be maddening, as if your hand were uncontrollable or possessed. No matter how much willpower you apply, the erratic behavior prevails. In that moment, we could write almost the exact same thing Claude did.