Pandemic Prediction Checklist: H5N1
Pandemic Prediction Checklist: Monkeypox
Correlation may imply some sort of causal link.
For guessing its direction, simple models help you think.
Controlled experiments, if they are well beyond the brink
Of .05 significance will make your unknowns shrink.
Replications show there's something new under the sun.
Did one cause the other? Did the other cause the one?
Are they both controlled by what has already begun?
Or was it their coincidence that caused it to be done?
I wonder if it has to do with how the model allocates attention. If I dump in a whole 500-line module and say “inspect for bugs,” perhaps because it has to spread attention over the entire file, each area gets a relatively cursory inspection, so it stops paying attention after finding the first bug in the region? Or maybe it finds bugs that impact multiple regions, and focuses on checking for the implications of the ones it discovered already rather than looking for new ones?
Complete speculation given the black box nature of these models of course.
Maybe? But I can’t imagine that typos are that well respresented and it’s good at catching those. I run my code through the LLM before I even try running it, often because I haven’t written test cases yet and because it can catch errors in bulk rather than the run-> crash on first error -> fix and rerun -> crash on next error cycle.
So it tends to contain bugs like typos that would have been typically caught in the pre-LLM era prior to asking stackoverflow for help diagnosing logic errors and so on, thus not showing up much in the training data.
Gemini 3.0 Pro is mostly excellent for code review, but sometimes misses REALLY obvious bugs. For example, missing that a getter function doesn't return anything, despite accurately reporting a typo in that same function.
This is odd considering how good it is at catching edge cases, version incompatibility errors based on previous conversations, and so on.
LLMs hallucinate because they are not trained on silence. Almost all text ever produced is humans saying what they know. The normal human reaction to noticing their own confusion is to shut up and study or distance themselves from the topic entirely. We either learn math or go through life saying "I hate math." We aren't forced to answer other people's math questions, come what may.
We casually accept catastrophic forgetting and apply the massive parameter count of the human brain to a tiny fraction of the total knowledge base we accept LLMs to master.
The record of how humans react to their own uncertainty is not in the training data.
Reasoning models attempt to monitor their own epistemic state. But because they have to return answers quickly and cannot modify their own weights, they face serious barriers to achieving human reliability. They don't know what it feels like to not know something or design a productive program of research and study.
Despite all this, what they can do now is extremely impressive, and I'm glad to have access to them at their current level of capability.
They probably do not make me more productive. They can be misleading, and they also enable me to dig into topics and projects where I'm less familiar and thus more vulnerable to being mislead. They enable me to explore interesting side projects and topics, which takes time away from my main work.
They make me less tolerant of inadequacy, both because they point out flaws in my code or reasoning and because they incline me toward perfectionism rather than constraining project scope to be within my capabilities and time budget. They will gold-plate a bad idea. But I've never had an LLM suggest I take a step back and look for an easier solution than the one I'm considering.
They'll casually suggest writing complex custom-built solutions to problems but never propose cutting features because it would be too complicated to execute unless I specifically ask, and then I never feel like I'm getting "independent judgment," just sycophancy.
They mostly rely on my descriptions of my work and ideas. They can observe my level of progress, but not my rate of progress. They see only the context I give them, and I'm too lazy to always give them all the context updates. They are also far more available than my human advisors. As such, the LLM's role in my life is to encourage and enable scope creep, while advisors are the brakes.
In fact, OpenAI’s CFO has already floated the idea of a government “backstop” (bailout).
https://www.wsj.com/video/openai-cfo-would-support-federal-backstop-for-chip-investments/4F6C864C-7332-448B-A9B4-66C321E60FE7
Top of my head - he’s hoping OpenAI, and the AI boom in general, is too big to fail. In 18 months, Trump will still be in office. The only aspect of the economy he’s rated as a “success” on is increasing stock prices. Those have been driven by AI. He’s openly trying to force the Fed to reinstate ZIRP. He could force a bailout of OpenAI (and others) on national security grounds. These would be considerations I’d fully expect Altman and others to have consciously considered, planned for, and even discussed with Trump and his advisors.
Motivated reasoning is a misfire of a generally helpful heuristic: try and understand why what other people are telling you makes sense.
In a high trust setting, people are usually well-served by assuming that there’s a good reason for what they’re told, what they believe, and what they’re doing. Saying, “figure out an explanation for why your current plans make sense” is motivated reasoning, but it’s also a way to just remember what the heck you’re doing and to coordinate effectively with others by anticipating how they’ll behave.
The thing to explain, I think, is why we apply this heuristic in less than full trust settings. My explanation for that is that this sense-making is still adaptive even in pretty low-trust settings. The best results you can get in a low-trust (or parasitic) setting are worse than you’d get in a higher-trust setting, but sense-making it typically leads to better outcomes than not.
In particular, while it’s easy in retrospect to pick a specific action (playing Civ all night) and say “I shouldn’t have sense-made that,” it’s hard to figure out in a forward-looking way which settings or activities do or don’t deserve sense-making. We just do it across the board, unless life has made us into experts on how to calibrate our sense-making. This might look like having enough experience with a liar to disregard everything they’re saying, and perhaps even to sense-make “ah, they’re lying to me like THIS for THAT reason.”
In summary, motivated reasoning is just sense-making, which is almost always net adaptive. Specific products, people and organizations take advantage of this to exploit people’s sense-making in limited ways. If we focus on the individual misfires in retrospect, it looks maladaptive. But if you had to predict in advance whether or not to sense-make any given thing, you’d be hard-pressed to do better than you’re already doing, which probably involves sense-making quite a bit of stuff most of the time.
Aside from a handful of incidents, the woke left and MAGA right are living together without murdering each other. It is not clear that the level of political violence has increased under Trump 2.
The problem we have is that both parties have been serving up weak candidates, and this is occurring against a backdrop of dysfunction in the federal government.
Replacing a weak candidate (Trump) with a better one would be direct, meaningful progress toward solving that problem.
It's a poor argument top to bottom. I tried writing up a critique, but it's hard to know what to focus on because there are so many flaws. Are there any particular points that you found compelling that you would like input on?
I meant bug reports that were due to typos in the code, compared to just typos in general.