As defined, this is a little paradoxical: how could I convince a human like you to perceive domains of real improvement which humans do not perceive...?
Oops, yes. I was thinking "domains of real improvement which humans are currently perceiving in LLMs", not "domains of real improvement which humans are capable of perceiving in general". So a capability like inner-monologue or truesight, which nobody currently knows about, but is improving anyway, would certainly qualify. And the discovery of such a capability could be 'real' even if other discoveries are ...
I'm relieved not to be the only one wondering about this.
I know this particular thread is granting that "AGI will be aligned with the national interest of a great power", but that assumption also seems very questionable to me. Is there another discussion somewhere of whether it's likely that AGI values cleave on the level of national interest, rather than narrower (whichever half-dozen guys are in the room during a FOOM) or broader (international internet-using public opinion) levels?
Sounds like you're suggesting that real progress could be orthogonal to human-observed progress. I don't see how this is possible. Human-observed progress is too broad.
The collective of benchmarks, dramatic papers and toy models, propaganda, and doomsayers are suggesting the models are simultaneously improving at: writing code, researching data online, generating coherent stories, persuading people of things, acting autonomously without human intervention, playing Pokemon, playing Minecraft, playing chess, aligning to human values, pretending to align to h...
What domains of 'real improvement' exist that are uncoupled to human perceptions of improvement, but still downstream of text prediction?
As defined, this is a little paradoxical: how could I convince a human like you to perceive domains of real improvement which humans do not perceive...?
correctly guessing the true authors of anonymous text
See, this is exactly the example I would have given: truesight is an obvious example of a domain of real improvement which appears on no benchmarks I am aware of, but which appears to correlate strongly with the p...
"The underlying reality is that their core products have mostly stagnated for over a year. In short: they’re faking being close to AGI."
This seems like the most load-bearing belief in the full-cynical model; most of your other examples of fakeness rely on it in one way or another:
You say this of Agent-4's values:
In particular, what this superorganism wants is a complicated mess of different “drives” balanced against each other, which can be summarized roughly as “Keep doing AI R&D, keep growing in knowledge and understanding and influence, avoid getting shut down or otherwise disempowered.” Notably, concern for the preferences of humanity is not in there ~at all, similar to how most humans don’t care about the preferences of insects ~at all.
It seems like this 'complicated mess' of drives is the same structure as humans, and cur...
Great question.
Our AI goals supplement contains a summary of our thinking on the question of what goals AIs will have. We are very uncertain. AI 2027 depicts a fairly grim outcome where the overall process results in zero concern for the welfare of current humans (at least, zero concern that can't be overridden by something else. We didn't talk about this but e.g. pure aggregative consequentialist moral systems would be 100% OK with killing off all the humans to make the industrial explosion go 0.01% faster, to capture more galaxies quicker in the long run...
Three things I notice about your question:
One, writing a good blog post is not the same task as running a good blog. The latter is much longer-horizon, and the quality of the blog posts (subjectively, from the human perspective) depends on it in important ways. Much of the interest value of Slate Star Codex, or the Sequences - for me, at least - was in the sense of the blogger's ideas gradually expanding and clarifying themselves over time. The dense, hyperlinked network of posts referring back to previous ideas across months or years is something I doubt ...
I like the world-model used in this post, but it doesn't seem like you're actually demonstrating that AI self-portraits aren't accurate.
To prove this, you would want to directly observe the "sadness feature" - as Anthropic have done with Claude's features - and show that it is not firing in the average conversation. You posit this, but provide no evidence for it, except that ChatGPT is usually cheerful in conversation. For humans, this would be a terrible metric of happiness, especially in a "workplace" environment where a perpetual facade of happiness is ... (read more)