says little about the intelligence of Claude
It says that it lacks intelligence to play zero shot and someone has to compensate the intelligence deficit with an exocortex.
It's like we can track progress by measuring "performance per exocortex complexity" where the complexity drops from "here's a bunch of buttons to press in sequence to win" to "".
AIs (probably scaffolded LLMs or similar)
That was a good start, but then you appear to hyper-focus on the "LLM" part of a "blogging system". In a strict sense the titular question is like asking "when will cerebellums become human-level athletes?".
Likewise, one could arguably frame this as a problem about insufficient "agency,"
Indeed. In a way, the real question here is "how can we orchestrate a bunch of LLMs and other stuff to have enough executive function?".
And, perhaps, whether it is at all possible to reduce other functions to language processing with extra steps.
but it is mysterious to me where the needed "agency" is supposed to come from
Bruh, from the Agancé region of France of course, otherwise it's a sparkling while
loop.
I wondered about using 4o for the poll and took the post to o1-pro.
Here's what it filled as "Potential Gaps or Additions":
The full history has two tables for credences (the main dish and the extras) with unclear provenance. To spice things up I've also asked for expected evidence to update up/down.
human-made innovative applications of the paradigm of automated continuous program search. Not AI models autonomously producing innovations.
Can we... you know, make an innovative application of the paradigm of automated continuous program search to find AI models that would autonomously produce innovations?
- RL will be good enough to turn LLMs into reliable tools for some fixed environments/tasks. They will reliably fall flat on their faces if moved outside those environments/tasks.
They don't have to "move outside those tasks" if they can be JIT-trained for cheap. It is the outer system that requests and produces them is general (or, one might say, "specialized in adaptation").
For alphazero, I want to point out that it was announced 6 years ago (infinity by AI scale), and from my understanding we still don't have a 1000x faster version, despite much interest in one.
I don't know the details, but whatever the NN thing (derived from Lc0, a clone of AlphaZero) inside current Stockfish is can play on a laptop GPU.
And even if AlphaZero derivatives didn't gain 3OOMs by themselves it doesn't update me much that that's something particularly hard. Google itself has no interest at improving it further and just moved on to MuZero, to AlphaFold etc.
I don't think they're blocked by an inability to run autonomously. They're blocked by lacking an eye for novelty/interestingness. You can make the slop factory to run 24/7 for a year and still not get any closer to solving alignment.