I am a PhD student in computer science at the University of Waterloo, supervised by Professor Ming Li and advised by Professor Marcus Hutter.
My current research is related to applications of algorithmic probability to sequential decision theory (universal artificial intelligence). Recently I have been trying to start a dialogue between the computational cognitive science and UAI communities. Sometimes I build robots, professionally or otherwise. Another hobby (and a personal favorite of my posts here) is the Sherlockian abduction master list, which is a crowdsourced project seeking to make "Sherlock Holmes" style inference feasible by compiling observational cues. Give it a read and see if you can contribute!
See my personal website colewyeth.com for an overview of my interests and work.
I also take this approach to agent foundations, which is why I like to tie different agendas together. Studying AIXI is part of that because many other approaches can be described as "depart from AIXI in this way to solve this informally stated problem with AIXI."
The problem is deeper than that.
Playing a game of chess takes hours. LLMs are pretty bad it, but we have had good chess engines for decades - why isn’t there a point way off on the top left for chess?
Answer: we’re only interested in highly general AI agents, which basically means LLMs. So we’re only looking at the performance of LLMs, right? But if you only look at LLM performance without scaffolding, it looks to me like that asymptotes around 15 minutes. Only by throwing in systems that use a massive amount of inference time compute do we recover a line with a consistent upwards slope. So we’re allowed to use search, just not narrow search like chess engines. This feels a little forced to me - we’re putting two importantly different things on the same plot.
Here is an alternative explanation of that graph: LLMs have been working increasingly well on short tasks, but probably not doubling task length every seven months. Then after 2024, a massive amount of effort poured into trying to make them do longer tasks by paying up a high cost in inference time compute and very carefully designed scaffolding, with very modest success. It’s not clear that anyone has another (good) idea.
With that said, if the claimed trend continues for another year (now that there are actually enough data points to usefully draw a line through) that would be enough for me to start finding this pretty convincing.
I haven’t read the paper (yet?) but from the plot I am not convinced. The points up to 2024 are too sparse, they don’t let us conclude much about that region of growth in abilities; but if they did, it would be a significantly lower slope. When the points become dense, the comparison is not fair - these are reasoning models which use far more inference time compute.
Yes, but it's also very easy to convince yourself you have more evidence than you do, e.g. invent a theory that is actually crazy but seems insightful to you (may or may not apply to this case).
I think intelligence is particularly hard to assess in this way because of recursivity.
Yeah, this is also just a pretty serious red flag for the OP’s epistemic humility… it amounts to saying “I have this brilliant idea but I am too brilliant to actually execute it, will one of you less smart people do it for me?” This is not something one should claim without a correspondingly stellar track record - otherwise, it strongly indicates that you simply haven’t tested your own ideas against reality.
Contact with reality may lower your confidence that you are one of the smartest younger supergeniuses, a hypothesis that should have around a 1 in a billion prior probability.
Which seems more likely: capabilities happen to increase very quickly around human genius levels of intelligence, or relative capabilities as compared to the rest of humanity by definition increase only when you’re on the frontier of human intelligence?
Einstein found a lot of currently undiscovered physics because he was somewhat smarter/more insightful than anyone else and so he got ahead. This says almost nothing about absolute capabilities of intelligence.
If orcas were actually that smart wouldn’t it be dangerous to talk to them for exactly the same reasons it would be dangerous to talk to a superintelligence?
No, it's possible for LLMs to solve a subset of those problems without being AGI (even conceivable, as the history of AI research shows we often assume tasks are AI complete when they are not e.g. Hofstader with chess, Turing with the Turing test).
I agree that the tests which are still standing are pretty close to AGI; this is not a problem with Thane's list though. He is correctly avoiding the failure mode I just pointed it out.
Unfortunately, this does mean that we may not be able to predict AGI is imminent until the last moment. That is a consequence of the black-box nature of LLMs and our general confusion about intelligence.
So the thing that coalitional agents are robust at is acting approximately like belief/goal agents, and you’re only making a structural claim about agency?
If so, I find your model pretty plausible.
This called a Hurwicz decision rule / criterion (your t is usually alpha).
I think the content of this argument is not that maxmin is fundamental, but rather that simplicity priors "look like" or justify Hurwicz-like decision rules. Simple versions of this are easy to prove but (as far as I know) do not appear in the literature.