Zvi covered education in a series of roundups ("Childhood and Education Roundup #N") In two most recent ones he concludes that American educational system is in a crisis and that the entire educational ‘expert’ class very obviously is engaged in enemy action, to which Zvi devoted an entire day.
ARC-AGI-1 performance of the newest Gemini 3 Flash and the older Grok 4 Fast implies a potential cluster of maximal capabilities of models with ~100B params/token. Unfortunately, the potential cluster didn't have any company try and create more models of such class.
Had RMP try to roast my post about evidence against CoT-based supercoders. The post itself is here. RMP's fact check managed to claim that I thought OpenBrain to be a real company (which I never did. What I did was to quote a piece of the AI-2027 scenario relevant to the authors' idea of solving alignment) and, which is worse, that the AI-2027 slowdown ending involved INTERNATIONAL coordination. The fallacy check claimed that GPT-5 and Grok 4 don't exist. Does it mean that the tool should doublecheck the claims related to new models?
Me too. It's METR who has yet to reveal anything aside from evidence extracted by Jurkovic about the models aside from C. Sonnet 4.5 (and GPT-5.1 Codex Max, but you didn't mention it; C. Sonnet 4.5 was never SOTA to begin with and could be unusable for the graph. GPT-5.1 Codex Max had someone add the data point to the AI-2027 graph and Kokotajlo notice the likely return of the 7 month doubling trend) But I doubt that "this kind of extensive work can hardly keep up with the release of new models providing new data", since an update of parameters would likely require mere days, if not minutes, of thinking per data point. See, e.g. Greenblatt's quick take about the GPT-5-related forecast and my two comments there, or my post on a worrisome trend which could have been invalidated by new models.
Thank you for covering the issue of optimizaton for virality in far more detail than my comment did! My worry is a different facet: what if such content distorts the users' brains with problematic results?
As for the Bleeding Mind persona, there turned out to exist a Russian short story written back in 2017 which Claude Opus 4.5 found rather similar. Additionally, I have a nitpick related to a phrase:
The nitpick
Self-Other Overlap (SOO), perhaps the only alignment approach which is "Not obviously stupid" according to Eliezer.
I would rather rephrase it as "the only alignment approach not from MIRI that Eliezer has bothered to read and didn't proceed to rule out on sight", which would imply that such approaches (e.g. this one) are highly unlikely not to be slop, not that Eliezer read all such approaches and deemed them to be stupid. For example, if Max Harms' idea of CAST and measuring empowerment was discovered or quasi-reformulated by an outsider, then this wouldn't mean that Eliezer considers the rediscovered approach stupid.
thoughtful libertarian-leaning neo-Cathars
I suspect that the moral intuitions that you mention are unpopular not just because of people's ignorance, but because these ideas reflect only a facet of the ground truth (which, in my opinion, should be more derived from first principles, e.g. claiming that the world itself is a big training environment for increasingly-large-scale coordination).
As for claims like "maybe we shouldn't design AGI or ASI to absolutely refuse to seek power", I think that they confabulate two different issues:
UPD: I also have prepared an interesting dialogue with Claude Opus 4.5.
The stationary bandit theory lets the government outright evolve from said criminals who are smart enough to think of their long-term interests.
An incomplete list of fiascos: letting a rival state take over without facing as dire consequences as possible, having a wildly mismanaged economy; in more modern settings states could also fail to develop tech (e.g. useful for warfare) and/or educate the workers; nowadays the entire mankind would collectively experience a failure mode if anyone creates a misaligned ASI without an aligned counterpart.
However, we had Tim Hua claim that he would allow such homeschoolers to exist. What I don't understand is whether he would let such homeschoolers propagate actual falsehoods and not just a different value set.
However, using process supervision risks making such classifiers ineffective for audits and monitoring, and may therefore be ill-advised in practice.
It's not just ill-advised, if I am not mistaken, then it's The Most Forbidden Technique
will, I believe, usher in a golden age of creativity and experimentation.
I think that it already has an entirely different result, but I can't find related research.
In the historical environment, memes in general would evolve by being retold from one individual to another or would be kept for a long time in the form of a book, painting, or object. Unlike short-form anecdotes and rumors, mere creation or retelling of a long-form story or a piece of art took a long time and reflection process. As a result, memetic environment historically required surviving information pieces to be remembered for a long time and deemed worthy of being transmitted, rather than superstimulating and viral.
A more modern environment also subjected memes and artifacts to censorship, and the rise of large-scale reproduction of newspapers or broadcasting mechanisms allowed the memetic environment to be influenced by companies (e.g. to advertise goods). While conservatives could also point out that companies have incentives to try and outcompete each other in misaligned stimuli like violence or eroticism, governments had the option to keep the competition in check.
As you suggest, it all changed with the rise of the Internet. The loss of barriers means that content is not just created by hordes of people or for less investment, but is optimized[1] for virality, including virality for niche readers, far stronger than historically.
Additionally, I expect that content optimized for virality influences the average readers' cultural taste and brings related changes in the reader's[2] capabilities or alignment with the potential to create feedback loops or outright echo chambers. One example is porn inducing erectile dysfunction or problems with relationships. Another is content explicitly called brainrot with corresponding results.
However, it could also be optimized by oversaturation of the market or as a result of the related genre becoming popular. I suspect that this happened with harem mangas, light novels and web novels.
This also includes influence on psyches of those who create similar content and fail to become famous, as happens with fan fiction writers.
This makes me think of the previous model with the biggest 50%/80% time horizon ratio, Grok 4. It had funny failures at 2 sec, 2min and 2h long tasks. What if an alternate-universe Claude who, like GPT-5.1-Codex-Max, succeeded at ALL tasks shorter than a minute, would have achieved a far bigger 80% time horizon? And if GPT-5.2 and Gemini 3 Pro had the failures at less-than-a-minute-long tasks ironed out, as happened with GPT-5 vs Grok 4?
EDIT: in theory, the alternate Claude could also end up with a worse 50% time horizon. But the real Claude succeeded on a quarter of 2-4 hr long tasks and about a half of 4-16 hr long tasks.