To be clear about my position, and to disagree with Lemoine, not passing a Turing test doesn't mean you aren't intelligent (or aren't sentient, or a moral patient). It only holds in the forward direction: passing a Turing Test is strong evidence that you are intelligent (and contain sentient pieces, and moral patients).
I think it's completely reasonable to take moral patienthood in LLMs seriously, though I suggest not assuming that entails a symmetric set of rights—LLMs are certainly not animals.
potentially implying that actual humans were getting a score of 27% "human" against GPT-4.5?!?!
Yes, but note that ELIZA had a reasonable score in the same data. Unless you're to believe that a human couldn't reliably distinguish ELIZA from a human, all this is saying is that either 5 minutes was simply not enough to talk to the two contestants, or the test was otherwise invalid somehow.
...
...ok I just rabbitholed on data analysis. Humans start to win against the best tested GPT if they get 7-8 replies. The best GPT model replied on average ~3 times faster than humans, and for humans at least the number of conversation turns was the strongest predictor of success. A significant fraction of GPT wins over humans were also from nonresponsive or minimally responsive human witnesses. This isn't a huge surprise, it was already obvious to me that the time limit was the primary cause of the result. The data backs the intuition up.
Most ELIZA wins, but certainly not all, seemed to be because the participants didn't understand or act as though this was a cooperative game. That's an opinionated read of the data rather than a simple fact, to be clear. Better incentives or a clearer explanation of the task would probably make a large difference.
Turing Tests were passed.
Basically all so-called Turing Tests that have been beaten are simply not Turing Tests. I have seen one plausible exception, showing that AI does well in a 5-minute limited versions of the test, seemingly due in large part to 5 minutes being much too short for a non-expert to tease at the remaining differences. The paper claims "Turing suggests a length of 5 minutes," but this is never actually said in that way, and also doesn't really make sense. This is, after all, Turing of Turing machines and of relative reducibility.
To the first part: yes, of course, my claim isn't that anything here is axiomatically unfair. It absolutely depends on the credences you give for different things, and the context you interpret them in. But I don't think the story in practice is justified.
If, instead, your concern is that the correspondence between Klurl's hypothetical examples and what they found when reaching the planet was improbably high, then I agree that is very coincidental, but I do not think that coincidence is being used as support for the story's intended lessons.
This is indeed approximately the source of my concern.
I think in a story like this if you show someone rapidly making narrow predictions and then repeatedly highlight how much more reasonable they are than their opponent as a transparent allegory for your narrow predictions being more reasonable than a particular bad opposing position from a post signposted as nonfiction inside a fictional frame, there really is no reasonable room to claim that actually people weren't meant to read things into the outcomes being predicted. Klurl wasn't merely making hypothetical examples, he was acting on specific predictions. It is actually germaine to the story and bad to sleight-of-hand away that Klurl was often doing no intellectual work. It is actually germaine to the story whether some of Trapaucius' arguments have nonzero Baeysean weight.
The claim that no simple change would have solved this issue seems like a failure of imagination, and anyway the story wasn't handed down to its author in stone. One could just write a less wrong story instead.
Let me try addressing your comment more bluntly to see if that helps.
Your complaint about Klurl's examples are that they are "coincidentally" drawn from the special class of examples that we already know are actually real, which makes them not fictional.
No, Klurl is not real. There are no robot aliens seeding our planet. The fictional evidence I was talking about was not that Earth right now exists in reality right now, it was that Earth right now exists in this story specifically at the point it was used.
If you write a story where a person prays and then wins the lottery as part of a demonstration of the efficacy of prayer, that is fictional evidence even though prayer and winning lotteries are both real things.
If you think that the way the story played out was misleading, that seems like a disagreement about reality, not a disagreement about how stories should be used.
No, I really am claiming that this was a misuse of the story format. I am not opposed to it because it's not reality. I am opposed to it because the format portends that the outcomes are illustrations of the arguments, but in this case the outcomes were deceptive illustrations.
If Trapaucius had arrived at the planet to find Star Trek technology and been immediately beamed into a holding cell, would that somehow have been less of a cheat, because it wasn't real?
It would be less of a cheat in the sense that it would give less of a false impression that the arguments were highly localizing, and in that it would be more obvious that the outcome was fanciful and not to be taken as a serious projection. But it would not be less of a cheat simply in the sense that it wasn't real, because my claim was never that this was cheating for using a real outcome.
I stand by what I said, but I don't want to argue about semantics. I would not have allowed myself to write a story this way.
The Star Trek claim is a false dichotomy. One could choose to directly show that the underspecified parts are underspecified, one could choose to show many examples of the ways this would near-miss, one could simply not write oneself into this corner in the first place. And in the rather hard to believe counterfatual that Yudkowsky didn't feel capable to make his story without such a contrivance, he could have just used a different frame, or a different format, or signposted the issue, or done some other thing instead.
"One does not live through a turn of the galaxy by taking occasional small risks."
I'll admit to this that the author being Yudkowsky heavily colored how I read this line. He has repeatedly, strongly taken the stance that AI risk is not about small probabilities, he would not be thinking so much about AI risk if his probability were order-1%, people who do care about order-1% risks are being silly, etc. There are lots of quotes but I'll take the first one I found on a search, not because it's the closest match but that it's the first one I found.
But the king of the worst award has to go to the Unironical Pascal's Wager argument, imo - "Sure the chances are tiny, but if there's even a tiny chance of destroying the lightcone..."
— https://x.com/ESYudkowsky/status/1617903894960693249
I do not know if I'm being unfair or generous to Yudkowsky to dismiss this defense for this reason. Regardless, I will.
I will say that the very next sentence Klurl states is,
"And to call this risk knowably small, would be to claim to know far too much."
and indeed I think this is an example where the literary contrivance hides the mistake. If the author wasn't forcing his hand, the risk would have been small. The coincidence they were in was unlikely on priors and not narrowed into from the arguments given.
—
What examples are you thinking of here?
It's obvious that human learning is exceptional, but I don't think Klurl's arguments even served to distinguish the rock sharpening skill from beaver dams, spider webs or bird nests, never mind the general set of so-termed ‘tool use’ in the wild. Stone tools aren't specific to humans, either, though I believe manufactured stone tools are localized to hominids, for example Homo floresiensis as a meaningfully distinct and AFAIK not ancestral cousin species.
Related but distinct, I'll draw specific attention to ants, which have a fascinating variety of evolutionary behaviours, including quite fascinating trap making with a cultivated fungus. Obviously not a generalizably intelligent behaviour, but yet Klurl did not even ask that of humans. (On an even less related note, Messor ibericus lays clones of Messor structor as part of their reproductive cycle, which is fascinating and came to mind a lot when reading the sections about stuff evolution supposedly can't solve because it, per the accusation, operates through one specific reproductive pathway.)
If this was presented as a piece of fiction first, sure, ‘bad for verisimilitude’. But Yudkowsky prefaces, best considered as nonfiction with a fictional-dialogue frame. When I consider it in that light, it's more than a problem of story beats, it's cheating evidence into play, it's an argumentative sleight of hand, it's generalizing from fictional evidence. I think it's misleading as to how strongly refutations actually hold, both in the abstract, and also as directly applied to the arguments Yudkowsky is defending in practice.
To the first point, I hope it was clear I'm not defending Trapaucius here. The story is maybe unfair to the validity of some of Trapaucius' arguments, but not that unfair, they were net pretty bad.
We have lots of examples of radiators in space (because it's approximately the only thing that works), and AFAIK micrometeor impacts haven't been a dealbreaker when you slightly overprovision capacity and have structural redundancy. I don't expect you'd want to spend too much on shielding, personally.
Not trying to claim Starcloud has a fully coherent plan, ofc.
It's not that complex in principle: you use really big radiators.
If you look at https://www.starcloud.com/'s front page video, you see exactly that. What might look like just a big solar array is actually also a big radiator.
AFAICT it's one of those things that works in principle but not in practice. In theory you can make really cheap space solar and radiator arrays, and with full reuse launch can approach the cost of propellant. In practice, we're not even close to that, and any short term bet on it is just going to fail.
I edited out the word 'significantly', which in retrospect was misleading.
I'd prefer not to repeat what I've heard. In case I'm making this sound more mysterious than it is, I will note that you're not missing out on any juicy gossip. Nothing I heard in passing would be material to much.