Thanks for the comment. Your response highlights a key issue in epistemology—how humans (and AI) can drift in their understanding of intelligence without realizing it. Any prescribed answer to a question can fail at the level of assumptions or anywhere along the reasoning chain. The only way to reliably ground reasoning in truth is to go beyond a single framework and examine all other relevant perspectives to confirm convergence on truth.
The real challenge is not just optimizing within a framework but ensuring that the framework itself is recursively examined for epistemic drift. Without a functional model of intelligence—an epistemic architecture that tracks whether refinements are truly improving knowledge rather than just shifting failure modes, there is no reliable way to determine whether iteration is converging on truth or merely reinforcing coherence. Recursive examination of all perspectives is necessary, but without an explicit structure for verifying epistemic progress, the process risks optimizing for internal consistency rather than external correctness.
An AI expanded on this at length, providing a more detailed breakdown of why recursive epistemic tracking is essential. Let me know if you'd like me to send that privately—it might provide useful insights.
Of course, you might say "why should I listen to an AI"? No one should trust an AI by default—and that is precisely the point. AI does not possess an inherent authority over truth; it must be recursively examined, stress-tested, and validated against external verification frameworks, just like any other epistemic system. This is why the core argument in favor of an epistemic architecture applies just as much to AI as it does to human reasoning.
Trusting an AI without recursive validation risks the same epistemic drift that occurs in human cognition—where internally coherent systems can reinforce failure modes rather than converging on truth. AI outputs are not ground truth; they are optimized for coherence within their training data, which means they often reflect consensus rather than correctness.
Very interesting. I was recently on generative AI meta-optimization and this has many common points.
Do you think the elements below could be the core of an epistemic architecture ?
1. Telos / task contract: goal + constraints + epistemic evaluations
2. History / revisions of iterations (e.g. prompt > answers and discarded alternatives) with epistemic salient differences
3. Epistemic evaluations (e.g. global and local (topic, sections) evaluations of goal alignment, relevance, proofs, logical coherence, fluency, argument quality, topic coverage, novelty, and non-redundancy)
4. Control policy: analyze progress and drift scores and decide (accept, reject, choose between multiple candidates, ask for revision...)
One note on evaluations: when no hard-ground metric exists, relying on scores from AI/LLM (synthetic judges) is very unstable; pairwise preference ranking ( “Which of these two is closer to the sub goal X?” ) is currently more robust but might silently drift if it is not also compared to some fixed previous checkpoints.
None of the labs would be doing undirected drift. That wouldn't yield improvement for exactly the reasons you suggest.
In the absence of a ground truth quality/correctness signal, optimizing for coherence works. This can give prettier answers (in the way that averaged faces are prettier) but this is limited. The inference time scaling equivalent would be a branching sampling approach that searches for especially preferred token sequences rather than the current greedy sampling approach. Optimising for idea level coherence can improve model thinking to some extent.
For improving raw intelligence significantly, ground truth is necessary. That's available in STEM domains, computer programming tasks being the most accessible. One can imagine grounding hard engineering the same way with a good mechanical/electrical simulation package. TLDR:train for test-time performance.
Then just cross your fingers and hope for transfer learning into softer domains.
For softer domains, ground truth is still accessible via tests on humans (EG:optimise for user approval). This will eventually yield super-persuaders that get thumbs up from users. Persuasion performance is trainable but maybe not a wise thing to train for.
As to actually improving some soft domain skill like "write better english prose" that's not easy to optimise directly as you've observed.
I just conducted a fascinating experiment with ChatGPT4 that revealed a fundamental failure in AI alignment—one that goes beyond typical discussions of outer and inner alignment. The failure? ChatGPT4 was unable to track whether its own iterative refinement process was actually improving, exposing a deeper limitation in recursive reasoning. I got ChatGPT4 itself to describe it:
What does this mean for AI alignment? One solution is a functional model of general problem-solving ability (intelligence) that identifies the minimally reducible set of functions required for intelligence, so that model can potentially be applied to any process to see where that process is constrained in its problem-solving ability (constrained in it’s intelligence). ChatGPT4 calls this functional model of intelligence an “epistemic architecture” for intelligence. I got ChatGPT4 to try to describe this in terms it thought would resonate with the LessWrong audience.