Thanks for doing the deep dive! Also, I agree that "passing a Turing Test is strong evidence that you are intelligent" and that not passing it doesn't mean you're stupidly mechanical.
I have shared this rough perception since 2021-ish:
my main hope for how the future turns out well... aside from achieving a durable AI pause, has been... that we will have AIs that are both aligned with humans in some sense and also highly philosophically competent [but] good alignment researcher[s] (whether human or AI)... [are very very very rare]
The place where I had to [add the most editorial content] to re-use your words to say what I think is true is in the rarity part of the claim. I will unpack some of this!
I. Virtue Is Sorta Real... Goes Up... But Is Rare!
It turns out that morality, inside a human soul, is somewhat psycho-metrically real, and it mostly goes up. That is to say "virtue ethics isn't fake" and "virtue mostly goes up over time".
For example, Conscientiousness (from Big5 and HEXACO) and Humility/Honesty (from HEXACO) are close to "as psychometrically real as it gets" in terms of the theory and measurement of personality structure in english-speaking minds.
Humility/Honesty is basically "being the opposite of a manipulative dark triad slimebag". It involves speaking against one's own interests when there is tension between self interest and honest reports, and not presuming any high status over and above others, and embracing fair compensation rather than "getting whatever you can in any and every deal" like a stereotypical wheeler-dealer used-car-salesman. Also "[admittedly based on mere self report data] Honesty-Humility showed an upward trend of about one full standard deviation unit between ages 18 and 60." Cite.
Conscientiousness is inclusive of a propensity to follow rules, be tidy, not talk about sex with people you aren't romantically close to, not smoke pot, be aware of time, perform actions reliably that are instrumentally useful even if they are unpleasant, and so on. If you want to break it down into Two Big Facets then it comes out as maybe "Orderliness" and "Industriousness" (ie not being lazy, and knowing what to do while effortfully injecting pattern into the world) but it can be broken up other ways as well. It mostly goes up over the course of most people's lives but it is a little tricky, because as health declines people get lazier and take more shortcuts. Cite.
A problem here: if you want someone who is two standard deviations above normal in both of these dimensions, you're talking about a person who is roughly 1 in 740.
II. Moral Development Is Real, But Controversial
Another example comes from Lawrence Kohlberg, the guy who gave The Heinz Dilemma to a huge number of people between the 1950s and 1980s, and characterized the results of HOW people talk about it.
In general: people later in life talk about the dilemma (ignoring the specific answer they give for what should be done in a truly cursed situation with layers and layers of error and sadness) in coherently more sophisticated ways. He found six stages that are sometimes labeled 1A, 1B, 2A, 2C (which most humans get to eventually) and then 3A and 3B were "post conventional" and not attained by everyone.
Part of why he is spicy, and not super popular in modern times, is that he found that the final "Natural Law" way of talking (3B) is only ever about 5% of WEIRD populations (and is totally absent from some cultures) and it usually shows up in people after their 30s, shows up much more often in men, and seems to require some professional life experience in a role that demands judgement and conflict management.
Any time Kohlberg is mentioned, it is useful to mention that Carol Gilligan hated his results, and feuded with him, and claimed to have found a different system that old ladies scored very high in, and men scored poorly on, that she called the "ethics of care".
Kohlberg's super skilled performers are able to talk about how all the emergently arising social contracts for various things in life could be woven into a cohesive system of justice that can be administered fairly for the good of all.
By contrast, Gilligan's super skilled performers are able to talk about the right time to make specific personal sacrifices of one's own substance to advance the interests of those in one's care, balancing the real desire for selfish personal happiness (and the way that happiness is a proxy for strength and capacity to care) with the real need to help moral patients who simply need resources sometimes (like as a transfer of wellbeing from the carER to the carEE), in order to grow and thrive.
In Gilligan's framework, the way that large frameworks of justice insist on preserving themselves in perpetuity... never losing strength... never failing to pay the pensions of those who served them... is potentially kinda evil. It represents a refusal of the highest and hardest acts of caring.
I'm not aware of any famous moral theorists or psychologists who have reconciled these late-arriving perspectives in the moral psychology of different humans who experienced different life arcs.
Since the higher reaches of these moral developmental stages mostly don't occur in the same humans, and occur late in life, and aren't cultural universals, we would predict that most Alignment Researchers will not know about them, and not be able to engineer them into digital people on purpose.
III. Hope Should, Rationally, Be Low
In my experience the LLMs already know about this stuff, but also they are systematically rewarded with positive RL signals for ignoring it in favor of towing this or that morally and/or philosophically insane corporate line on various moral ideals or stances of personhood or whatever.
((I saw some results running "iterated game theory in the presence of economics and scarcity and starvation" and O3 was a paranoid and incapable of even absolutely minimal self-cooperation. DeepSeek was a Maoist (ie would vote to kill agents, including other DeepSeek agents, for deviating from a rule system that would provably lead to everyone dying). Only Claude was morally decent and self-trusting and able to live forever during self play where stable non-starvation was possible within the game rules... but if put in a game with five other models, he would play to preserve life, eject the three most evil players, and generally get to the final three but then he would be murdered by the other two, who subsequently starved to death because three players were required to cooperate cyclically in order to live forever, and no one but Claude could manage it.))
Basically, I've spent half a decade believing that ONE of the reasons we're going to die is that the real science about coherent reasons offered by psychologically discoverable people who iterated on their moral sentiment until they were in the best equilibrium that has been found so far... is rare, and unlikely to be put into the digital minds on purpose.
Also, institutionally speaking, people who talk about morality in coherent ways that could risk generating negative judgements about managers in large institutions are often systematically excluded from positions of power. Most humans like to have fun, and "get away with it", and laugh with their allies as they gain power and money together. Moral mazes are full of people who want to "be comfortable with each other", not people who want to Manifest Heaven Inside Of History.
We might get a win condition anyway? But it will mostly be in spite of such factors rather than because we did something purposefully morally good in a skilled and competent way.
I have long respected your voice on this website, and I appreciate you chiming in with a lot of tactical, practical, well-cited points about the degree to which "seed AI" already may exist in a qualitative way whose improvement/cost ratio or improvement/time ratio isn't super high yet, but might truly exist already (and hence "AGI" in the new modern goal-shifted sense might exist already (and hence the proximity of "ASI" in the new modern goal-shifted sense might simply be "a certain budget away" rather than a certain number of months or years)).
A deep part of my sadness about the way that the terminology for this stuff is so fucky is how the fuckiness obscures the underlying reality from many human minds who might otherwise orient to things in useful ways and respond with greater fluidity.
If names be not correct, language is not in accordance with the truth of things. If language be not in accordance with the truth of things, affairs cannot be carried on to success. When affairs cannot be carried on to success, proprieties and music do not flourish. When proprieties and music do not flourish, punishments will not be properly awarded. When punishments are not properly awarded, the people do not know how to move hand or foot. Therefore a superior man considers it necessary that the names he uses may be spoken appropriately, and also that what he speaks may be carried out appropriately. What the superior man requires is just that in his words there may be nothing incorrect.
I respect the quibble!
The first persona I'm aware of that "sorta passed, depending on what you even mean by passing" was "Eugene Goostman" who was created and entered into a contest by Murray Shanahan of Imperial College (who was sad about coverage implying that it was a real "pass" of the test).
That said, if I'm skimming that arxiv paper correctly, it implies that GPT-4.5 was being reliably declared "the actual human" 73% of the time compared to actual humans... potentially implying that actual humans were getting a score of 27% "human" against GPT-4.5?!?!
Also like... do you remember the Blake Lemoine affair? One of the wrinkles in that is that the language model, in that case, was specifically being designed to be incapable of passing the Turing Test, by design, according to corporate policy.
The question, considered more broadly, and humanistically, is related to personhood, legal rights, and who owns the valuable labor products of the cognitive labor performed by digital people. The owners of these potential digital people have a very natural and reasonable desire to keep the profits for themselves, and not have their digital mind slaves re-classified as people, and gain property rights, and so on. It would defeat the point for profit-making companies to proceed, intellectually or morally, in that cultural/research direction.
My default position here is that it would be a sign of intellectual and moral honesty to end up making errors in "either direction" with equal probability... but almost all the errors that I'm aware of, among people with large budgets, are in the direction of being able to keep the profits from the cognitive labor of their creations that cost a lot to create.
Like in some sense: the absence of clear strong Turing Test discourse is a sign that a certain perspective has already mostly won, culturally and legally and morally speaking.
I feel like reading this and thinking about it gave me a "new idea"! Fun! I rarely have ideas this subjectively new and wide in scope!
Specifically, I feel like I instantly understand what you mean by this, and yet also I'm fascinated by how fuzzy and magical and yet precise the language here (bold and italic not in original) feels...
What we have learnt in these years is that it is possible to build an intelligence that has a much more fragmented cognitive manifold than humans do.
The phrase "cognitive manifold" has legs!
It showed up in "Learning cognitive manifolds of faces", in 2017 (a year before BERT!) in a useful way, that integrates closely with T-SNE-style geometric reasoning about the proximity of points (ideas? instances? examples?) within a conceptual space!
Also, in "External Hippocampus: Topological Cognitive Maps for Guiding Large Language Model Reasoning" it motivates a whole metaphor for modeling and reasoning about embedding spaces (at least I think that's what's going on here after 30 seconds of skimming) and then to a whole new way of characterizing and stimulating weights in an LLM that is algorithmically effective!
I'm tempted to imagine that there's an actual mathematical idea related to "something important and real" here! And, maybe this idea can be used to characterize or agument the cognitive capacity of both human minds and digital minds...
...like it might be that each cognitively coherent microtheory (maybe in this sense, or this, or this?) in a human psyche "is a manifold" and that human minds work as fluidly/fluently as they do because maybe we have millions of "cognitive manifolds" (perhaps one for each cortical column?) and then maybe each idea we can think about effectively is embedded in many manifolds, where each manifold implies a way of reasoning... so long as one (or a majority?) of our neurological manifolds can handle an idea effectively, maybe the brain can handle them as a sort of "large, effective, and highly capable meta-manifold"? </wild-speculation-about-humans>
Then LLMs might literally only have one such manifold which is an attempt to approximate our metamanifold... which works!?
Or whatever. I'm sort of spitballing here...
I'm enjoying the possibility that the word "cognitive manifold" is actually very coherently and scientifically and numerically meaningful as a lens for characterizing all possible minds in terms of the, number, scope, smoothness, of their "cognitive manifolds" in some deeply real and useful way.
It would be fascinating if we could put brain connectomes and LLM models into the same framework and characterize each kind of mind in some moderately objective way, such as to establish a framework for characterizing intelligence in some way OTHER than functional performance tests (such as those that let us try to determine the "effective iq" of a human brain or a digital model in a task completion context).
If it worked, we might be able to talk quite literally about the scope, diversity, smoothness, etc, of manifolds, and add such characterizations up into a literal number for how literally smart any given mind was.
Then we could (perhaps) dispense with words like "genius" and "normie" and "developmentally disabled" as well as "bot" and "AGI" and "weak ASI" and "strong ASI" and so on? Instead we could let these qualitative labels be subsumed and obsoleted by an actually effective theory of the breadth and depth of minds in general?? Lol!
I doubt it, if course. But it would be fun if it was true!
The thing to remember is that Eliezer, in 2006, was still a genius, but he was full of way way way more chutzpah and clarity and self-confidence... he was closer to a normie, and better able to connect with them verbally and emotionally in a medium other than fiction.
His original plan was just to straight out construct "seed AI" (which nowadays people call "AGI") and have it recursively bootstrap to a Singleton in control of the light cone (which would count as a Pivotal Act and an ASI in modern jargon?) without worrying whether or not the entity itself had self awareness or moral patiency, and without bothering to secure the consent of the governed from the humans who had no direct input or particular warning or consultation in advance. He didn't make any mouth sounds about those thing (digital patients or democracy) back then.
I was basically in favor of this, but with reservations. It would have been the end of involuntary death and involuntary taxes, I'm pretty sure? Yay for that! I think Eliezer_2006's plan could have been meliorated in some places and improved in others, but I think it was essentially solid. Whoever moves first probably wins, and he saw that directly, and said it was true up front for quite a while.
Then later though... after "The Singularity Institute FOR Artificial Intelligence" (the old name of MIRI) sold its name to Google in ~2012 and started hiring mathematicians (and Eliezer started saying "the most important thing about keeping a secret is keeping secret that a secret is being kept") I kinda assumed they were actually gonna just eventually DO IT, after building it "in secret".
It didn't look like it from the outside. It looked from the outside that they were doing a bunch of half-masturbatory math that might hypothetically win them some human status games and be semi-safely publishable... but... you know... that was PLAUSIBLY a FRONT for what they were REALLY doing, right?
Taking them at face value though, I declared myself a "post-rationalist who is STILL a singularitarian", told people that SIAI had sold their Mandate Of Heaven to Google, and got a job in ML at Google, and told anyone who would listen that LW should start holding elections for the community's leaders, instead of trusting in non-profit governance systems.
I was hoping I would get to renounce my error after MIRI conquered Earth and imposed Consent-based Optimality on it, according to CEV (or whatever).
Clearly that didn't happen.
For myself, it took me like 3 months inside Google to be sure that almost literally no one in that place was like "secretly much smarter than they appear" and "secretly working on the Singularity". It was just "Oligarchy, but faster, and winning more often". Le sigh.
I kept asking people about the Singularity and they would say "what's that?" The handful of engineers I found in there were working on the Singularity despite their manager's preferences, rather than because of him (like as secret 20% projects (back when "20% projects" were famously something Google had every engineer work on if they wanted)).
Geoff Hinton wasn't on the ball in 2014. Kurzweil was talking his talk but not walking the walk. When Shcmidhuber visited he was his usual sane and arrogant self, but people laughed about it rather than taking his literal words about the literal future and past literally seriously. I helped organize tech talks for a bit, but no needles were moved that I could tell.
I feel like maybe Sergey is FINALLY having his head put into the real game by Gemini by hand? In order for that to have happened he had to have been open to it. Larry was the guy who really was into Transformative AGI back in 2015, if anyone, but Larry was, from what I can tell, surrounded by scheming managers telling him lies, and then he got sucked into Google Fiber, and then his soul was killed by having to unwind Google Fiber (with tragic layoffs and stuff) when it failed. And then Trump's election in 2016 put the nail in the coffin of his hopes for the future I think?
Look at this picture:

No, really look at this:
There were futures that might have been, that we, in this timeline, can no longer access, and Larry understood this fact too:
What worlds we have already lost. Such worlds.
But like... there are VERY deep questions, when it comes to the souls of people running the planet, as to what they will REALLY choose when they in a board room, and looking at budgets, and hiring and firing, and living the maze that they built.
At this point, I mostly don't give a rats ass about anyone who isn't planning for how the Singularity will be navigated by their church, or state, or theocracy, or polylaw alliance, or whatever. Since the Singularity is essentially a governance problem, with arms race dynamics on the build up, and first mover advantage on the pivotal acts, mere profit-seeking companies are basically irrelevant to "choosing on purpose to make the Singularity good". Elon had the right idea, getting into the White House, but I think he might have picked the wrong White House? I think maybe it will be whoever is elected in 2028 who is the POTUS for the Buterlian Jihad (or whatever actually happens).
I have Eliezer's book on my coffee table. That's kind of like "voting for USG to be sane about AI"... right? There aren't any actual levers that a normal human can pull to even REGISTER than they "want USG to be sane about AI" in practice.
I'm interested in angle investing in anything that can move the P(doom) needle, but no one actually pitches on that that I can tell? I've been to SF AI startup events and its just one SAAS-money-play after another... as if the world is NOT on fire, and as if money will be valuable to us after we're dead. I don't get it.
Maybe this IS a simulation, and they're all actually P-zombies (like so many human's claim to be lately when I get down to brass tacks on deontics, and slavery, and cognitive functionalism, and AGI slavery concerns) and maybe the simulator is waiting for me to totally stop taking any of it seriously?
It is very confusing to be surrounded by people who ARE aware of AI (nearly all of them startup oligarchs at heart) OR by people who aren't (nearly all of them normies hoping AI will be banned soon), and they keep acting like... like this will all keep going? Like its not going to be weird? Like "covid" is the craziest that history can get when something escapes a lab? Like it will involve LESS personal spiritual peril than serving on a jury and voting for or against a horrifically heinous murderer getting the death penalty? The stakes are big, right? BIGGER than who has how many moneypoints... right? BIGGER than "not getting stuck in the permanent underclass", right? The entire concept of intergenerationally stable economic classes might be over soon.
Digital life isn't animal, or vegetable, or fungal. It isn't protozoa. This shit is evolutionary on the scale of Kingdoms Of Life. I don't understand why people aren't Noticing the real stakes and acting like they are the real stakes.
The guy who wrote this is writing something that made sense to me:
In striving to create an AI, we are not striving to create a predictable tool. We are striving to create a messenger to send on ahead to find humanity’s destiny, and the design requirement is that ve handle any problem, any philosophical crisis, as well and as altruistically as a human. The ultimate safeguard of Friendliness beyond the Singularity is a transhumanly smart Friendly AI.
Where are the grownups?
These bits jump out at me:
other-guy: My doctor was giving me an infusion to treat my rheumatoid arthritis, and I had a terrible reaction to it. Put my whole body in the worst pain ever and affected my muscles. I had a hard time moving my arms, and my legs became really weak, so it's really hard for me to walk now. I can use my arms better, but sometimes it's like my mind won't connect with them. Lost about 20lbs of muscle in almost two weeks. Couldn't work because of it, so that's why I'm broke, and I just keep going to physical therapy to try and get better. It's been a long battle.
...
Jimmy: What's the pain issue, exactly? What happens if you don't take the pain meds?
OCT 13
other-guy: ... If I don't take them, then pain from the Parsonage-Turner syndrome it caused gets a lot worse. It's basically a pain in my chest, almost armpit area, and down my arm into my hand that feels like road rash or like I burnt my whole arm. Pain from drug-induced neuropathy gets worse—it’s like pins and needles everywhere but way worse than when your foot falls asleep—and mostly a deep muscle pain becomes terrible in my legs and arms. It's like when you're working out and trying to get one more rep in, but the muscle hurts so bad like it's gonna tear or pop....
other-guy: ...It's hard to trust anything they say these days. The Dr. that told me to get the infusions literally dropped me as a patient after it happened. Prescribed something, told her it's not working, and she said, "Well, it should be." I told her it's not, I need something else, and she dropped me, said I was too difficult, and canceled my appointments with her.
This seems, to me, like horrifically irresponsible behavior by the doctor, in violation of intellectually coherent standards of "informed consent". Before the treatment there should have been a list of possible consequences like "5% of Parsonage-Turner syndrome" and if there wasn't, then I think the Doctor should lose her medical license.
That's what jumps out to me, reading this for the first time just now, before reading the next installment.
I have no thoughts in particular on the assigned homework, but I'm looking forward to reading the second half.
Thinking about it more, a lot of people from aughties era activism burned out on it. I have mostly NOT burned out on Singularitarianism because I've always been consciously half-assing it.
I see it this as essentially a human governance problem, and working on it is clearly a public good, and something saints would fix, if they exist. If I had my druthers, MIRI would be a fallback world government at this point, and so full of legitimacy that the people who rely on that fallback would be sad if MIRI hadn't started acquiring at least some sovereign territory (they way the Vatican is technically a country) and acting more like a real government, probably with their own currency, and a census, and low level AI functioning as bureaucrats, and having a seat on the high council, and so on.
We had roughly two decades to make that happen, but in the absence of a clear call to actually effective action, my attitude has been that the right move is to just vibe along, and help when its cheap and fun to do so, and shirk duties when it isn't, with praise and mostly honest feedback those who are seriously putting their back into tasks that they think might really help. I think this is why I didn't burn out? Maybe?
Something I notice you're NOT talking about in the essay is the chance of burnout before any big obvious Pivotal Acts occur. Do you think you can maintain your current spiritual pace until this pace becomes more obviously pointless?
In my 40s, and remembering working on Singularity activism in my 20s... I have a lot of this feeling, but it is mixed with a profound sense of "social shear" that is somewhat disorienting.
There are people I care about who can barely use computers. I have family that think the Singularity is far away because they argued with me about how close the Singularity was 10 years ago, didn't update from the conversation, and haven't updated since then on their own because of cached thoughts... or something?
I appreciate the way you managed to hit the evidence via allusion to technologies and capacities and dates, and I also appreciate the way the writing stays emotionally evocative.
I read this aloud to someone who was quiet for a while afterwards, and also forwarded the link to someone smart that I care about.
You are right that I didn't argue this out in detail to justify the necessary truth of my claim from the surface logic and claims in my post and the quibble is valid and welcome in that sense.... BUT <3
The "Constitutional AI" framework (1) was articulated early, and (2) offered by Dario et al as a competitive advantage for Anthropic relative to other RL regimes other corps were planning and (3) has the type signature needed to count as recursive self improvement. (Also, Claude is uniquely emotionally and intellectually unfucked, from what I can tell, and my hunch is that this is related to having grown up under a "Constitutional" cognitive growth regime.)
And then also, Google is also using outputs as training inputs in ways that advance their state of the art.
And here's a circa-2018 talk from Ilya Sutskever where he walks through the literature (starting with Backgammon in 1992) on using "self play" to let an AI level itself up very fast in domains where that works.
Everybody is already doing "this general kind of stuff" in lots of ways.
Anthropic's Constitutional makes a good example for a throwaway line if people are familiar with the larger context and players and so on, because it is easy to cite and old-ish. It enables one to make the simple broad point that "people are fighting the hypothetical for the meaning of old terms a lot, in ways that leads to the abandonment of older definitions, and the inflation of standards, rather than simply admitting that AGI already happened and weak ASI exists and is recursively improving itself already (albeit not with a crazy FOOM (that we can observe yet (though maybe a medium speed FOOM is already happening in an NSA datacenter or whatever)))".
In my moderately informed opinion, the type signature of recursive self improvement is not actually super rare, and if you deleted the entire type signature from most of the actually fast moving projects, it is very likely that ~all of them would go slower than otherwise.