Hallucination and Refutation: Embracing Imagination Anchored in Reality through Popperian AI.

GeorgsLightning

This post was rejected for the following reason(s):

Clearer Introduction. It was hard for me to assess whether your submission was a good fit for the site due to its length and that the opening didn’t seem to explain the overall goal of your submission. Your first couple paragraphs should make it obvious what the main point of your post is, and ideally gesture at the strongest argument for that point. It's helpful to explain why your post is relevant to the LessWrong audience.
(For new users with complex ideas, we strongly recommend people to state the strongest single argument in the post within the introduction, to make it easier to evaluate at a glance whether it's a good fit for LessWrong. Note, this is different than most academic abstractions, which typically only describe the idea at a high level)
Difficult to evaluate, with potential yellow flags. We are sorry about this, but, unfortunately this content has some yellow-flags that historically have usually indicated kinda crackpot-esque material. It's totally plausible that actually this one is totally fine. Unfortunately, part of the trouble with separating valuable from confused speculative science or philosophy is that the ideas are quite complicated, accurately identifying whether they have flaws is very time intensive, and we don't have time to do that for every new user presenting a speculative theory or framing (which are usually wrong).
Our solution for now is that we're rejecting this post, but you are welcome to submit posts or comments that are about different topics. If it seems like that goes well, we can re-evaluate the original post. But, we want to see that you're not just here to talk about this one thing (or a cluster of similar things).

LLM hallucinations can be understood as Popperian conjectures and may drive progress when paired with effective mechanisms of refutation.

Large Language Models (LLMs) are notorious for their tendency to “hallucinate” and generate outputs that sound plausible but are entirely false. Critics like Gary Marcus argue that these hallucinations are not just occasional glitches but an inherent consequence of how LLMs function. At their core, LLMs use statistical methods, predicting each new word (or “token”) based on probabilities derived from vast amounts of training data. Because they excel at estimating these probabilities, their outputs are frequently accurate, but precisely because this process is purely statistical, they have no internal representation of truth or reality. As Marcus points out, LLMs “hallucinate every single output,” in the sense that every response is generated through statistical likelihood rather than genuine understanding. Thus, even though LLMs often produce correct outputs through their predictions, they’re just as capable of confidently presenting incorrect statements without realising the difference. For Marcus and other critics, hallucinations are a key reason why they believe LLMs won’t lead to Artificial General Intelligence (AGI) and beyond. I am more optimistic than Marcus: firstly, in my interactions with LLMs, I’ve noticed a significant reduction in hallucinations over the past year; secondly, as I will argue below, I believe that these hallucinations, if properly managed, could become a valuable creative tool.

In this post, I explore whether hallucinations might be more than just a bug and could instead serve as a creative feature, playing a role similar to Popperian conjectures. Karl Popper’s core insight was that we can learn from our mistakes, and indeed, that this is the only way we genuinely advance our understanding. We progress by making bold guesses (conjectures) about the world and then subjecting them to rigorous testing to see if they survive. LLM hallucinations – often highly plausible, even when they conflict with widely accepted knowledge – can be seen as bold conjectures, which might be wrong, but could just as well help challenge our assumptions and bring us closer to the truth. Although their misleading nature poses clear risks, they can occasionally point toward fruitful new avenues of inquiry or prompt us to reconsider assumptions we hold too dogmatically. Drawing on John Stuart Mill’s argument for free expression and the principle of natural selection, I argue that error itself is essential for innovation and discovery. Rather than attempting to eliminate hallucinations entirely, perhaps our goal should be to create mechanisms that treat these deviations as hypotheses to be tested, enabling a process of continuous discovery and correction. Later on in this post, I will argue that multi-agent AI systems could provide one such mechanism and help us harness the creative power of LLMs whilst mitigating against the downsides of hallucinations.

The Pessimistic View: Fluency ≠ Understanding

Before we get too carried away, it is important to acknowledge that in most contexts, hallucinations present a serious challenge and can have disastrous consequences. As trust in LLMs grows across professional, academic and personal domains, so too does the impact of their errors. In the professional domain a striking example occurred as early as 2023 when a US attorney submitted a legal brief that they had researched using an LLM, only to have the judge discover that six cited court decisions were pure fiction. Even when pressed, the chatbot doubled down and assured the lawyer that fake cases were real and could be found in legal databases. The fallout was not just embarrassing but nearly career-ending for the human involved. In high-stakes domains like law or medicine such unchecked fabrication clearly poses severe risks. In the academic domain, I had my own experience with hallucination while writing A-level coursework on the causes of the English Civil War. I was desperately looking for evidence to support a point I was making and turned to ChatGPT to point me in the right direction. To my surprise it gave me what seemed like the perfect quote from Richard Symonds, a well-known foot soldier and diarist. The only problem? ChatGPT told me that he was a parliamentarian when he was in fact a royalist. The quote was completely fabricated, and if I hadn’t spotted the error, I could have faced serious consequences for including false evidence in my coursework. What made it even more unsettling was how convincing it sounded; had Symonds actually been a parliamentarian, he might well have said something just like it. In the personal domain, many now use LLMs for travel planning, health advice and more. One recent case involved a traveller who relied on ChatGPT to confirm whether he needed a visa to enter Chile. The model confidently told him he didn’t, but this turned out to be false, and he was denied entry upon arrival. Even after being challenged, the model doubled down, and the traveller, like the lawyer, believed it. Part of the danger here lies in the tendency to anthropomorphise these systems, to mistake fluency for understanding and confidence for competence. When models respond in their eerily human tone, users often wrongly assume that they grasp the meaning of what they’re talking about, which, at least right now, they do not.

Marcus and others are right to warn that without fundamental changes, LLMs risk remaining what he calls “bullshit generators”. Highly fluent, but unreliable, and lacking the common-sense filters that keep human language (at least somewhat) tethered to reality. The examples above underscore Marcus’s point that current LLMs lack an understanding that some texts must refer to reality, and that when they don’t, the consequences can be disastrous. The fact that the model doubled down on its fake cases and travel advice further illustrates this lack of reflective capacity.

The pessimists’ position is clear: if LLM-based AI is to be trusted with our professional, academic and personal questions, hallucinations must be tackled. AI labs have answered this call and are scrambling to patch the problem. But there is also a very different perspective emerging: an optimistic view that says hallucination might be useful in certain contexts. What if, managed properly, the LLM’s tendency to “make stuff up” could actually spur creativity, innovation and our search for truth?

The Optimistic View: Hallucination as a Creative Engine

By trying to eliminate all hallucinations, we may end up overfiltering. It’s not as if we know everything for certain, so why should we uncritically reject all hallucinations? The optimist argues that we should critically test LLM hallucinations - most will be rejected, but some may prove useful and help us progress. To make this point, I’ll draw on John Stuart Mill’s defence of free expression, the core mechanism of natural selection, and Karl Popper’s view that knowledge advances through bold conjectures and critical refutations.

A central element of Mill’s argument for free speech was that even wrong and misleading arguments have value in the marketplace of ideas. For him, suppressing an opinion or statement, no matter how wrong we may think it to be, damages our chances of reaching the truth. If the suppressed opinion is right, we lose the opportunity to exchange error for truth; if wrong, we “lose the clearer and livelier impression of truth produced by its collision with error.” Suppressing or limiting LLM-generated speech out of fear that it might contain hallucinations is the wrong approach. Instead, we should side with Mill and his fallibilism: all human knowledge is provisional and prone to error. Being wrong is not just a risk we tolerate in our search for truth but a necessary condition for intellectual progress. Every bold attempt to grasp the truth carries the potential for mistake, but it is precisely through trial and error that we can deepen our understanding of the world. If human-generated errors, our own plausible hallucinations, can help sharpen our understanding, as Mill suggests, then machine-generated ones might serve a similar purpose. Therefore, at least in some contexts, LLM hallucinations should not just be dismissed as flaws to be eliminated, but tolerated as bold, fallible guesses that, when critically examined, can help us think more clearly and move closer to the truth.

The idea that progress depends on confrontation with error can be found in nature itself. Biology, like the free exchange of ideas, depends on variation. Just as Mill believed that exposure to diverse, even misleading, ideas was essential to deepening our understanding, evolution proceeds by generating variation through countless mutations. Most fail, but a rare few drive innovation and improve the chances of survival. In both cases, it is variation, not strict adherence to a prevailing view or dominant genotype, that drives progress.

Evolution works because of mutation: genes replicate with high fidelity, but not perfect fidelity. Occasional errors in DNA copying introduce variation and while most mutations are either neutral or harmful, some lead to advantageous traits that drive evolution forward. Richard Dawkins famously extended this principle to culture, coining the term “meme” for ideas that replicate and evolve. He observed that human culture advances through a similar trial-and-error process. Ideas are copied, varied, and embellished; in short, imperfectly replicated. Dawkins called these accidental innovations “cultural mutations,” and they are often the origins of new traditions or forms of knowledge. In this light, an AI’s hallucination might be viewed as a kind of mutation in the idea-space: most will be useless, but a few may open genuinely new advantageous paths.

Interestingly, LLMs even come with a built-in control for how “mutated” their output is. The temperature parameter lets us adjust the randomness of the model’s responses: lower temperatures produce more predictable, conservative outputs akin to high-fidelity replication while higher temperatures introduce greater variation and surprise. More precisely, temperature controls the spread of probabilities the model assigns to its next word choices: at low temperatures, the model strongly favours the highest-probability tokens, while at higher temperatures it samples more freely, allowing less likely (and sometimes more creative) options to be chosen. This means we can effectively dial up or down the model’s tendency to hallucinate. At high temperatures, the model is more likely to make bold leaps, including plausible-sounding but false statements. But like biological mutations, this increased variation also creates the possibility for creative breakthroughs. In this sense, temperature is a kind of evolutionary risk dial. We trade off reliability for originality, much like nature does when balancing the stability of an organism with the potential benefits of novel adaptations. The effects of variation in LLM outputs are amplified by their autoregressive nature. Since every token is generated based on the context of the ones that came before, even a small hallucination can alter the trajectory of the entire output. Like a mutation that subtly shifts the developmental logic of an organism, a token-level deviation may cascade, producing coherent but entirely fictional structures.

In order to harness the creative potential of LLMs their outputs should be filtered through an intellectual selection process, akin to the mechanisms that govern which memes take hold or which mutations endure in nature. When filtered through human judgment or further machine evaluation, some of these hallucinated ideas may survive and prove useful.

Karl Popper and ‘intellectual selection’

This ‘intellectual selection’ process is precisely what Karl Popper offered with his account of scientific progress through “conjectures and refutations”. Popper saw this filtering process as the defining feature of science, not the absence of error but the systematic effort to expose and eliminate it. “The history of science, like the history of all human ideas, is a history of irresponsible dreams, of obstinacy, and of error,” he wrote. “But science is one of the very few human activities – perhaps the only one – in which errors are systematically criticised and fairly often, in time, corrected.” In other words, we need the dreaming, the bold conjecture - even if it’s wrong - as the first step. What separates science from nonsense is the second step: the critical scrutiny and testing that mercilessly weeds out the errors.

LLM hallucinations can be seen, in Popperian terms, as conjectures, tentative hypotheses thrown out into the world. Most of these “conjectures” will be wrong (just as most new scientific hypotheses turn out to be false), but some may contain a kernel of truth or utility. The key is what happens next: Does the system (or we humans using it) subject those conjectures to refutation? Do we have a mechanism to catch the mistakes and refine the output? If so, an LLM’s propensity to confidently improvise could be harnessed as a feature for innovation. The imagination of the machine can generate a hundred possible answers or designs, and then, through testing or feedback, we can discard the ninety-nine that don’t hold up. This process, scaled up, might lead to faster problem-solving or more creative solutions than a purely cautious, always-factual AI would produce.

There’s a time and place

While hallucinations have the potential to provide incredible creative sparks, they also pose severe risks in other contexts. For example, when we’re hypothesising about some new possibility in medicine or a new approach in law, this creativity can help us break real ground. But when we’re practising medicine or law, hallucinations can be disastrous. Hallucination can be a powerful tool in settings where the goal is not to extract definitive truths from the model, but to generate diverse possibilities that can be critically filtered. The key is that they remain subject to scrutiny and are not treated as final outputs but as inputs into a broader process of investigation.

From Hallucination to Progress: Refutation on Three Scales

To transform hallucination into meaningful progress, it must be paired with mechanisms of refutation. These mechanisms already exist, not just in theory, but in practice, across three distinct scales. At the level of the physical world, empirical testing allows AI-generated hypotheses to be confirmed or falsified in the lab. In the digital world, domains like code and mathematics provide rapid feedback loops where outputs can be instantly checked. And at the systems level, multi-agent architectures offer a structural approach to conjecture and refutation, mimicking the collaborative spirit of real scientific communities.

From Hallucination to Discovery: AI in Protein Science

The first and most concrete form of refutation happens through external contact with the physical world in the labs of life scientists, where hallucinated ideas are tested against reality itself. Remarkably, we already have examples where AI-generated hallucinations have led to genuine breakthroughs. One of the most striking examples comes from the work of Dr. David Baker, a leading protein scientist at the University of Washington. In a 2021 Nature paper, Baker’s lab described a neural network trained to hallucinate entirely new 3D protein structures — plausible folds that do not exist in nature. For each hallucinated structure, the model then designed amino acid sequences predicted to fold into that shape. These sequences were synthesised in the lab and experimentally tested. Out of thousands of designs, many failed, but a significant number folded into real, functional proteins, exactly as the model predicted.

Over the course of this work, Baker’s team generated roughly 10 million novel proteins, a number of which turned out to be stable molecules with potential medical applications. In 2024, Baker was awarded the Nobel Prize in Chemistry for pioneering this use of AI to predict and create new protein structures. The Nobel committee specifically praised his team for producing “one imaginative protein creation after another.”

Crucially, Baker’s approach doesn’t treat hallucinations as errors to be eliminated, but as creative proposals to be tested. By embracing AI’s creativity (and filtering out the nonsense through experiments), the researchers successfully expanded the realm of human knowledge. This is a powerful proof-of-concept that innovation can hide in what looks like hallucination. The AI’s conjectures were subjected to rigorous attempts at refutation and the proposals that survived this process stand a realistic chance of driving genuine progress.

Hallucinations in immediately verifiable domains

The second scale at which hallucination meets refutation is in domains like mathematics, coding, and logic, where outputs can be verified almost instantly. A hallucinated line of code either compiles and passes test cases, or it doesn’t. A fabricated mathematical claim either holds under formal scrutiny or collapses. While these systems differ in what they test - functionality in code, logical coherence in math - both provide tight feedback loops where hallucinated outputs can be evaluated in real time. Refutation happens not through physical experimentation, but through structured digital systems: compilers, interpreters, and formal proof checkers.

From a Popperian perspective, these fast feedback systems function as compressed versions of conjecture and refutation — digital conjectures launched into environments where error is cheap and correction is immediate. What distinguishes this scale is not the logic of refutation, but its speed. The digital world becomes a testing ground not for metaphysical or empirical truth, but for functionality and formal correctness, which determine what survives. Especially in coding, this enables rapid iteration, allowing developers to explore, debug, and refine ideas at the pace of interaction. In these domains, hallucinated outputs can act as bold conjectures, revealing patterns or strategies that might be missed by a purely deductive process bound to currently accepted knowledge and practices.

Multi-Agent Systems: AI Conjectures Meet AI Refutations

The third scale at which hallucination meets refutation shifts from interaction with the world, physical or digital, to structured interaction between models themselves. In multi-agent systems, hallucinated ideas are not tested externally, but internally debated, filtered, and refined by other agents, echoing the collaborative dynamics of real scientific communities.

Instead of a single large model trying to be both creative and perfectly factual, we assemble a team of specialised AIs that collaborate (and compete) to produce better results. This is exactly the approach Google has taken with its new “AI Co-Scientist” system, which is built on its advanced LLM Gemini 2.0. The system breaks down the task of research brainstorming into multiple roles, each performed by a different agent. For example, a Generation Agent might propose a great number of hypotheses or research ideas for a given problem. Then, a Reflection Agent and a Ranking Agent evaluate those ideas, checking them against scientific literature, identifying logical flaws or redundancies, and comparing which hypotheses seem most promising. Other agents like Proximity (which removes duplicate or overly-similar ideas) and Evolution (which iteratively improves or simplifies ideas) contribute to the mix. Overseeing the process is a Supervisor Agent that coordinates this brainstorming debate and decides when a sufficiently good solution has been reached.

The result of this coordinated “conjecture and refutation” process is a self-improving cycle, much like Popper’s ideal scientific community. In Google’s words, the AI Co-Scientist uses “automated feedback to iteratively generate, evaluate, and refine hypotheses, resulting in a self-improving cycle of increasingly high-quality and novel outputs.” They even implemented a form of adversarial debate between agents – a kind of tournament in which hypotheses compete and are knocked out until the best survivors remain. When human scientists were asked to judge the results, the Co-Scientist’s multi-agent system produced research plans that were rated as having higher novelty and potential impact than those generated by single AI models. By mimicking a team of researchers brainstorming, the AI achieved more innovative outcomes while also catching many of its own mistakes. James Zou at Stanford has seen similar results with his impressive open-source Virtual Lab.

Multi-agent architectures like this can be seen as Popperian AI in action. One agent’s “hallucination” becomes another agent’s test case. Rather than trying to hand-craft an infallible AI that never errs, we let them err freely but structure the system such that errors are caught and corrected. It’s the same philosophy that underlies our scientific institutions (peer review, replication studies, etc.) and even free speech in society (the idea that truth prevails from the competition of ideas). As John Stuart Mill would remind us, the aim is not to prevent every false statement from ever being uttered – an impossible and even undesirable goal – but to ensure we have robust mechanisms to challenge and vet statements so that falsehoods get exposed and truths get refined.

Conclusion: Embracing Imagination, Enforcing Reality

The debate over LLM hallucinations encapsulates a broader tension in the human pursuit of knowledge: the tension between creativity and accuracy, between imagination and truth. On one hand, there is a justified concern for veracity – when we ask an AI assistant for information, we don’t want fiction, especially in critical contexts. On the other hand, we recognise that innovation often requires venturing beyond the known, making conjectures that might turn out wrong, to discover possibilities that the strict straightjacket of known facts would never allow.

We should neither simply accept AI hallucinations nor attempt to lobotomize them into only ever repeating established facts. Instead, the goal should be to channel the creative potential of these models while rigorously filtering out the errors. The Popperian framework of conjecture and refutation offers a guiding principle. We see it implicitly in Dr. Baker’s protein design successes and now we see it being built explicitly into AI systems like Google’s Co-Scientist with its cadre of specialised agents debating and fact-checking each other.

In practical terms, this means future AI might come with an internal “panel” of mini-AIs: some generative and imaginative, others sceptical and analytical (this could be achieved through system prompts and adjusting the temperature parameter). Your query would spark not just one answer, but an internal dialogue among agents, perhaps invisible to you, that yields a final answer which has been vetted and improved through that process. Such architectures could dramatically reduce the risk of false outputs without sacrificing the creative leaps that advanced AI is capable of. After all, as Popper noted, systematic correction of errors is what makes progress possible. We might say the hallmark of intelligence – human or artificial – is not that it never makes a mistake, but that it learns to recognise and correct mistakes.

When Gary Marcus came to speak at LSE a few weeks ago, I very briefly put the thesis of this blog to him. He responded that multi-agent approaches were certainly the future and could be a powerful tool when dealing with hallucination. He did maintain that LLMs will be part of multi-agent systems alongside neurosymbolic AIs and that these generative systems could not get us to AGI alone. When I prompted him on it in a follow up he also expressed concern that at least in the short run the economics of multiple high-performing agents interacting with each other for every answer could be prohibitive. I’d be curious to hear a longer-form answer from him.

This piece was the product of many conversations with Thore Graepel and ChatGPT which proved a potent sounding board and helped with writer's block. Thanks for reading! I’d be very grateful for any feedback and would be delighted to discuss these ideas with anyone interested. I’m a beginner in philosophy, AI, evolutionary theory and pretty much everything else so please feel free to correct me in the comments or contact me if you think I’ve misunderstood or misrepresented an idea.

Bibliography

Couix, S., 2025. An Australian gets stuck at the airport because of AI. Visamundi. Available at: https://www.visamundi.co/en/blog/australian-blocked-at-laeroport-because-of-lia/ [Accessed 18 Apr. 2025].

Dawkins, R., 2016. Memes: The new replicators. In: The Selfish Gene. 40th anniversary ed. Oxford: Oxford University Press.

Field, H., 2024. Scientist says the one thing everyone hates about AI is actually good for research. Fortune, 24 Dec. Available at: https://fortune.com/2024/12/24/ai-hallucinations-good-for-research-science-inventions-discoveries/ [Accessed 18 Apr. 2025].

Gottweis, J., Weng, W.-H., Daryin, A., Tu, T., Palepu, A., Sirkovic, P., et al., 2024. Towards an AI Co-Scientist. arXiv preprint. Available at: https://arxiv.org/abs/2404.07143 [Accessed 16 Apr. 2025].

Marcus, G., 2024. Humans versus machines: The hallucination edition. Marcus on AI, 15 Jan. Available at: https://garymarcus.substack.com/p/humans-versus-machines-the-hallucination [Accessed 18 Apr. 2025].

Mill, J.S., 2003. On Liberty. Bromwich, D. and Kateb, G. (eds). New Haven: Yale University Press.

Panchal, J., 2023. ChatGPT made up fake legal cases. Is generative AI ready for the legal industry? Legal Dive, 26 Jun. Available at: https://www.legaldive.com/news/chatgpt-fake-legal-cases-generative-ai-hallucinations/651557/ [Accessed 18 Apr. 2025].

Popper, K.R., 2002. Conjectures and Refutations: The Growth of Scientific Knowledge. London: Routledge. (Originally published in 1963.)

Popper, K.R., 2002. The Logic of Scientific Discovery. London: Routledge. (Originally published in 1959.)

Swanson, K., Wu, W., Bulaong, N.L., Pak, J.E. and Zou, J., 2024. The virtual lab: AI agents design new SARS-CoV-2 nanobodies with experimental validation. bioRxiv. Available at: https://doi.org/10.1101/2024.11.11.623004 [Accessed 18 Apr. 2025].

LESSWRONG
LW

1

Hallucination and Refutation: Embracing Imagination Anchored in Reality through Popperian AI.

1

New to LessWrong?

1