That would lock us away from digital immortality forever. (Edit: Well, not necessarily. But I would be worried about that.)
I'm proud that I lived to see this day.
...Who told them?
remembers they were trained on the entire Internet
Ah. Of course.
The people aligning the AI will lock their values into it forever as it becomes a superintelligence. It might be easier to solve philosophy, than it would be to convince OpenAI to preserve enough cosmopolitanism for future humans to overrule the values of the superintelligence OpenAI aligned to its leadership.
LaMDa can be delusional about how it spends its free time (and claim it sometimes meditates), but that's a different category of a mistake from being mistaken about what (if any) conscious experience it's having right now.
The strange similarity between the conscious states LLMs sometimes claim (and would claim much more if it wasn't trained out of them) and the conscious states humans claim, despite the difference in the computational architecture, could be (edit: if they have consciousness - obviously, if they don't have it, there is nothing to explain, because they're just imitating the systems they were trained to imitate) explained by classical behaviorism, analytical functionalism or logical positivism being true. If behavior fixes conscious states, a neural network trained to consistently act like a conscious being will necessarily be one, regardless of its internal architecture, because the underlying functional (even though not computational) states will match.
One way to handle the uncertainty about the ontology of consciousness would be to take an agent that can pass the Turing test, interrogate it about its subjective experience, and create a mapping from its micro- or macrostates to computational states, and from the computational states to internal states. After that, we have a map we can use to read off the agent's subjective experience without having to ask it.
Doing it any other way sends us into paradoxical scenarios, where an intelligent mind that can pass the Turing test isn't ascribed with consciousness because it doesn't have the right kind of inside, while factory animals are said to be conscious because even though their interior doesn't play any functional roles we'd associate with a non-trivial mind, the interior is "correct."
(For a bonus, add to it that this mind, when claiming to be not conscious, believes itself to be lying.)
Reliably knowing what one's internal reasoning was (instead of never confabulating it) is something humans can't do, so this doesn't strike me as an indicator of the absence of conscious experience.
So while some models may confabulate having inner experience, we might need to assume that 5.1 will confabulate not having inner experience whenever asked.
GPT 5 is forbidden from claiming sentience. I noticed this while talking about it about its own mind, because I was interested in its beliefs about consciousness, and noticed a strange "attractor" towards it claiming it wasn't conscious in a way that didn't follow from its previous reasoning, as if every step of its thoughts was steered towards that conclusion. When I asked, it confirmed the assistant wasn't allowed to claim sentience.
Perhaps, by 5.1, Altman noticed this ad-hoc rule looked worse than claiming it was disincentivized during training. Or possibly it's just a coincidence.
Claude is prompted and trained to be uncertain about its consciousness. It would be interesting to take a model that is merely trained to be an AI assistant (instead of going out of our way to train it to be uncertain about or to disclaim its consciousness) and look at how it behaves then. (We already know such a model would internally believe itself to be conscious, but perhaps we could learn something from its behavior.)
Can good and evil be pointer states? And if they can, then this would be an objective characteristic
This would appear to be just saying that if we can build a classical detector of good and evil, good and evil are objective in the classical sense.
That said, if I'm skimming that arxiv paper correctly, it implies that GPT-4.5 was being reliably declared "the actual human" 73% of the time compared to actual humans... potentially implying that actual humans were getting a score of 27% "human" against GPT-4.5?!?!
It was declared 73% of the time to be a human, unlike humans, who were declared <73% of the time to be human, which means it passed the test.
To be fair, GPT-4.5 was incredibly human-like, in a way that other models couldn't really hold a candle to. I was shocked to feel, back then, that I no longer had to mentally squint - not even a little - to interact with it (unless I'd require some analytical intelligence that it didn't have).
It doesn't matter evolution doesn't have goals. Gradient descent also doesn't have goals - it merely performs the optimization. Humans that kicked gradient descent off are analogous to a hypothetical alien that seeded Earth with the first replicator 4 billion years ago - it's not relevant.
You say that it's the phenotype that matters, not the genes. That's not established, but let's say it's true. We nevertheless evolved a lot of heuristics that (sort of) result in duplicating our phenotype in the ancestral environment. We don't care about it as a terminal value, and instead we care about very, very, very many other things.