All of Cole Wyeth's Comments + Replies

Yes, but it's also very easy to convince yourself you have more evidence than you do, e.g. invent a theory that is actually crazy but seems insightful to you (may or may not apply to this case).

I think intelligence is particularly hard to assess in this way because of recursivity. 

Yeah, this is also just a pretty serious red flag for the OP’s epistemic humility… it amounts to saying “I have this brilliant idea but I am too brilliant to actually execute it, will one of you less smart people do it for me?” This is not something one should claim without a correspondingly stellar track record - otherwise, it strongly indicates that you simply haven’t tested your own ideas against reality.


Contact with reality may lower your confidence that you are one of the smartest younger supergeniuses, a hypothesis that should have around a 1 in a billion prior probability.

3AprilSR
Maybe it should have 1 in a billion priors, but that isn't very relevant. The question isn't actually decided by precisely how many bits of evidence you'd need to conclude it, it's trivial to come by strong evidence supporting the idea. 

Which seems more likely: capabilities happen to increase very quickly around human genius levels of intelligence, or  relative capabilities as compared to the rest of humanity by definition increase only when you’re on the frontier of human intelligence? 
Einstein found a lot of currently undiscovered physics because he was somewhat smarter/more insightful than anyone else and so he got ahead. This says almost nothing about absolute capabilities of intelligence. 

If orcas were actually that smart wouldn’t it be dangerous to talk to them for exactly the same reasons it would be dangerous to talk to a superintelligence?

2Towards_Keeperhood
By >=+6std I mean potential of how smart they could be if they were trained similarly to us, not actual current intelligence. Sorry I didn't write this in this post, though I did in others. I'd be extremely shocked if orcas were actually that smart already. They don't have science and they aren't trained in abstract reasoning. Like, when an orca is +7std, he'd be like a +7std hunter gatherer human, who is probably not all that good at abstract reasoning tasks (like learning a language through brute-force abstract pattern recognition). (EDIT: Ok actually it would be like a +7std hunter gatherer society, which might be significantly different. Idk what I'm supposed to expect there. Still wouldn't expect it to be dangerous to talk to them though. And actually when I think about +7std societies I must admit that this sounds not that likely. That they ought to have more information exchange outside their pods and related pods or so and coordinate better. I guess that updates me downwards a bit on orcas being actually that smart - aka I hadn't previously properly considered effects from +7std cultural evolution rather than just individual intelligence.)

No, it's possible for LLMs to solve a subset of those problems without being AGI (even conceivable, as the history of AI research shows we often assume tasks are AI complete when they are not e.g. Hofstader with chess, Turing with the Turing test).

I agree that the tests which are still standing are pretty close to AGI; this is not a problem with Thane's list though. He is correctly avoiding the failure mode I just pointed it out. 

Unfortunately, this does mean that we may not be able to predict AGI is imminent until the last moment. That is a consequence of the black-box nature of LLMs and our general confusion about intelligence.

So the thing that coalitional agents are robust at is acting approximately like belief/goal agents, and you’re only making a structural claim about agency?

If so, I find your model pretty plausible.

4Richard_Ngo
Oh, I see what you mean now. In that case, no, I disagree. Right now this notion of robustness is pre-theoretic. I suspect that we can characterize robustness as "acting like a belief/goal agent" in the limit, but part of my point is that we don't even know what it means to act "approximately like belief/goal agents" in realistic regimes, because e.g. belief/goal agents as we currently characterize them can't learn new concepts. Relatedly, see the dialogue in this post.

This sounds like how Scott formulated it, but as far as I know none of the actual (semi)formalizations look like this this.

2Vladimir_Nesov
The first paragraph is my response to how you describe UDT in the post, I think the slightly different framing where only the abstract algorithm is the agent fits UDT better. It only makes the decision to choose the policy, but it doesn't make commitments for itself, because it only exists for that single decision, influencing all the concrete situations where that decision (policy) gets accessed/observed/computed (in part). The second paragraph is the way I think about how to improve on UDT, but I don't have a formulation of it that I like. Specifically, I don't like for those past abstract agents to be an explicit part of a multi-step history, like in Logical Induction (or a variant that includes utilities), with explicit stages. It seems too much of a cludge and doesn't seem to have a prospect of describing coordination between different agents (with different preferences and priors). Past stages should be able to take into account their influence on any computations at all that choose to listen to them, not just things that were explicitly included as later stages or receiving messages or causal observations, in a particular policy formulation game. Influence on outcomes mediated purely through choice of behavior that an abstract algorithm makes for itself also seems more in spirit of UDT. The issue with UDT is that it tries to do too much in that single policy-choosing thing that it wants to be an algorithm but that mostly can't be an actual algorithm, rather than working through smaller actual algorithms that form parts of a larger setting, interacting through choice of their behavior and by observing each other's behavior.

What is this coalitional structure for if not to approximate an EU maximizing agent?

2Richard_Ngo
This quote from my comment above addresses this:

A couple of years later, do you still believe that foom will happen any year now?

How would this model treat mathematicians working on hard open problems? P vs NP might be counter factual just because no one else is smart enough or has the right advantage to solve it. Insofar as central problems of a field have been identified but not solved, I’m not sure your model gives good advice. 

I visited Mikhail Khovanov once in New York to give a seminar talk, and after it was all over and I was wandering around seeing the sights, he gave me a call and offered a long string of general advice on how to be the kind of person who does truly novel things (he's famous for this, you can read about Khovanov homology). One thing he said was "look for things that aren't there" haha. It's actually very practical advice, which I think about often and attempt to live up to!

This is a response directly to comments made by Richard Ngo at the CMU agent foundations conference. Though he requested I comment here, the claims I want to focus on go beyond this (and the previous post) and include the following: 

1: redefining agency as coalitional (agent = cooperating subagents) as opposed to the normal belief/goal model.

2: justifying this model by arguing that subagents are required for robustness in hard domains (specifically those that require concept invention).

3: that therefore AIXI is irrelevant for understanding agency.&nbs... (read more)

2Richard_Ngo
Thank you Cole for the comment! Some quick thoughts in response (though I've skipped commenting on the biology examples and the ML examples since I think our intuitions here are a bit too different to usefully resolve via text): Yepp, this is a good rephrasing. I'd clarify a bit by saying: after some level of decomposition, the recursion reaches agents which are limited to simple enough domains (like recognizing shapes in your visual field) that they aren't strongly bottlenecked on forming new concepts (like all higher-level agents are). In domains that simple, the difference between heuristics and planners is much less well-defined (e.g. a "pick up a cup" subagent has a scope of maybe 1 second, so there's just not much planning to do). So I'm open to describing such subagents as utility-maximizers with bounded scope (e.g. utility 1 if they pick up the cup in the next second, 0 if they don't, -10 if they knock it over). This is still different from "utility-maximizers" in the classic LessWrong sense (which are usually understood as not being bounded in terms of time or scope). This feels crucial to me. There's a level of optimality at which you no longer care about robustness, because you're so good at planning that you can account for every consideration. Stockfish, for example, is willing to play moves that go against any standard chess intuition, because it has calculated out so many lines that it's confident it works in that specific case. (Though even then, note that this leaves it vulnerable to neural chess engines!) But for anything short of that, you want to be able to integrate subagents with non-overlapping ontologies into the same decision procedure. E.g. if your internally planning subagent has come up with some clever and convoluted plan, you want some other subagent to be able to say "I can't critique this plan in its own ontology but my heuristics say it's going to fail". More generally, attempted unifications of ontologies have the same problem as

Okay, what I meant is “says little in favor of the intelligence of Claude”

Yeah, I agree - overall I agree pretty closely with Thane about LLMs but his final conclusions don't seem to follow from the model presented here.

They might solve it in a year, with one stunning conceptual insight. They might solve it in ten years or more. There's no deciding evidence either way; by default, I expect the trend of punctuated equilibria in AI research to continue for some time. 

I did begin and then abandon a sequence about this, cognitive algorithms as scaffolding. I’m like halfway to disendorsing it though. 

This won’t work, happy to bet on it if you want to make a manifold market. 

Maybe, but on reasonable interpretations I think this should cause us to expect AGI to be farther not nearer. 

Yes, but because this scaffolding would have to be invented separately for each task, it’s no longer really zero shot and says little about the intelligence of Claude. 

1IC Rainbow
It says that it lacks intelligence to play zero shot and someone has to compensate the intelligence deficit with an exocortex. It's like we can track progress by measuring "performance per exocortex complexity" where the complexity drops from "here's a bunch of buttons to press in sequence to win" to "".
6ozziegooen
Obvious point that we might soon be able to have LLMs code up this necessary scaffolding. This isn't clearly very far-off, from what I can tell.
1Dana
Well, vision and mapping seem like they could be pretty generic (and I expect much better vision in future base models anyway). For the third limitation, I think it's quite possible that Claude could provide an appropriate segmentation strategy for whatever environment it is told it is being placed into. Whether this would be a display of its intelligence, or just its capabilities, is beside the point from my perspective.
Cole Wyeth20-4

This is convincing evidence LLMs are far from AGI.

Eventually, one of the labs will solve it, a bunch of people will publicly update, and I’ll point out that actually the entire conversation about how an LLM should beat Pokémon was in the training data, the scaffolding was carefully set up to keep it on rails in this specific game, the available action set etc is essentially feature selection, etc. 

evalu*126

I disagree because to me this just looks like LLMs are one algorithmic improvement away from having executive function, similar to how they couldn't do system 2 style reasoning until this year when RL on math problems started working.

For example, being unable to change its goals on the fly: If a kid kept trying to go forward when his pokemon were too weak. He would keep losing, get upset, and hopefully in a moment of mental clarity, learn the general principle that he should step back and reconsider his goals every so often. I think most children learn som... (read more)

Seems like an easy way to create a less-fakeable benchmark would be to evaluate the LLM+scaffolding on multiple different games?  Optimizing for beating Pokemon Red alone would of course be a cheap PR win, so people will try to do it.  But optimizing for beating a wide variety of games would be a much bigger win, since it would probably require the AI to develop some more actually-valuable agentic capabilities.

It will probably be correct to chide people who update on the cheap PR win.  But perhaps the bigger win, which would actually justify such updates, might come soon afterwards!

Action item: comment on a recent LessWrong or Alignment Forum post on AI safety or write a blog post on AI safety.

This is not generically a good idea. We don't need a bunch of comments on lesswrong posts from new users. It's fine, if there's a specific reason for it (obviously I endorse commenting on lesswrong posts in some cases). 

2Stephen McAleese
Thank you for bringing up this issue. While we don't want low quality comments, comments can provide helpful feedback to the author and clarify the reader's thinking. Because of these benefits, I believe commenting should be encouraged. The upvoting and downvoting mechanisms helps filter out low-quality comments so I don’t think there’s a significant risk of them overwhelming the discussion.

There are a lot of years between 2030 and 2075.

How do you know it's sufficient? Is it not salient to you primarily because it is the current bottleneck?

If "task execution" includes execution of a wide enough class of tasks, obviously the claim becomes trivially true. If it is interpreted more reasonably, I think it is probably false.

I appreciate you wrote this, particularly the final section. 

Cole Wyeth14-1

I give it ~70%, except caveats:

"Maybe a slight tweak to the LLM architecture, maybe a completely novel neurosymbolic approach."

It won't be neurosymbolic.

Also I don't see where the 2030 number is coming from. At this point my uncertainty is almost in the exponent again. Seems like decades is plausible (maybe <50% though).

It's not clear that only one breakthrough is necessary. 

9Auspicious
I also found that take very unusual, especially when combined with this: The last sentence seems extremely overconfident, especially combined with the otherwise bearish conclusions in this post. I'm surprised no one else has mentioned it.

Without an intelligence explosion, it's around 2030 that scaling through increasing funding runs out of steam and slows down to the speed of chip improvement. This slowdown happens around the same time (maybe 2028-2034) even with a lot more commercial success (if that success precedes the slowdown), because scaling faster takes exponentially more money. So there's more probability density of transformative advances before ~2030 than after, to the extent that scaling contributes to this probability.

That's my reason to see 2030 as a meaningful threshold, Tha... (read more)

Answer by Cole Wyeth10

As you probably know, I have been endorsing this "conspiracy theory" for some time, e.g. roughly here: https://www.lesswrong.com/posts/vvgND6aLjuDR6QzDF/my-model-of-what-is-going-on-with-llms

Hard disagree, this is evidence of slowdown.

As the model updates grow more dense I also check out; a large jump in capabilities between the original gpt-4 and gpt-4.5 would remain salient to me. This is not salient. 

7Paragox
My other comment was bearish, but in the bullish direction, I'm surprised Zvi didn't include any of Gwern's threads, like this or this, which apropos of Karpathy's blind test I think have been the best clear examples of superior "taste" or quality from 4.5, and actually swapped my preferences on 4.5 vs 4o when I looked closer.  As text prediction becomes ever-more superhuman, I would actually expect improvements in many domains to become increasingly non-salient, as it takes ever increasing thoughtfulness / language nuance to appreciate the gains. But back to bearishness, it is unclear to me how much this mode-collapse improvement could just be dominated by postraining improvements instead of the pretraining scaleup.  And of course, to wonder how superhuman text prediction improvement will ever pragmatically alleviate the regime's weaknesses in the many known economical and benchmarked domains, especially if Q-Star fails to generalize much at scale, just like multimodality failed to generalize much at scale before it.  We are currently scaling super human predictors of textual, visual, and audio datasets. The datasets themselves, primarily composed of the internet plus increasingly synthetically varied copies, is so generalized and varied that this prediction ability, by default, cannot escape including human-like problem solving and other agentic behaviors, as Janus helped model with Simulacrums some time ago. But as they engorge themselves with increasingly opaque and superhuman heuristics towards that sole goal of predicting the next token, to expect that the intrinsically discovered methods will continue trending towards classically desired agentic and AGI-like behaviors seems naïve. The current convenient lack of a substantial gap between being good at predicting the internet and being good at figuring out a generalized problem will probably dissipate, and Goodhart will rear it's nasty head as the ever-optimized-for objective diverges ever-further from the a

Why call it "converse Lawvere" instead of the more standard "utm property" of general recursion theory, e.g. as in Odifreddi? Only because the maps are to [0,1]? That seems like insufficient reason to adopt an unrelated name.

1SamEisenstat
Yeah, it's a bit weird to call this particular problem the converse Lawvere problem. The name suggests that there is a converse Lawvere problem for any concrete category and any base object $Y$--namely, to find an object $X$ and a map $f: X \to Y^X$ such that every morphism $X \to Y$ has the form $f(x)$ for some $x \in X$. But, among a small group of researchers, we ended up using the name for the case where $Y$ is the unit interval and the category is that of topological spaces, or alternatively a reasonable topological-like category. In this post, I play with that problem a bit, since I pass to considering functions with computability properties, rather than a topological definition. (I also don't exactly define a category here...) I agree that what's going on here is a UTM property. I think this context of considering different categories is interesting. There's an interplay between diagonal-lemma-type fixed-point theorems and topological fixed-point theorems going on in work like probabilistic truth predicates, reflective oracles, and logical induction. For more on analogies between fixed-point theorems, you can see e.g. A Universal Approach to Self-Referential Paradoxes, Incompleteness and Fixed Points and Scott Garrabrant's fixed point sequence.

Yes, this is the type of idea big labs will definitely already have (also what I think ~100% of the time someone says "I don't have to give big labs any ideas").

1williawa
That's what I also thought haha, else I wouldn't post it.

Oh, I know. It's normally 5-20 years from lab to home. My 2027 prediction is for a research robot being able to do anything a human can do in an ordinary environment, not necessarily a mass-producable, inexpensive product for consumers or even most businesses. But obviously the advent of superintelligence, under my model, is going to accelerate those usual 5-20 year timelines quite a bit, so it can't be much after 2027 that you'll be able to buy your own android. Assuming "buying things" is still a thing, assuming the world remains recognizable for at leas

... (read more)

I'm not actually relying on a heuristic, I'm compressing https://www.lesswrong.com/posts/vvgND6aLjuDR6QzDF/my-model-of-what-is-going-on-with-llms

If you extrapolate capability graphs in the most straightforward way, you get the result that AGI should arrive around 2027-2028. Scenario analyses (like the ones produced by Kokotajlo and Aschenbrenner) tend to converge on the same result.

If you extrapolate log GDP growth or the value of the S&P 500, superintelligence would not be anticipated any time soon. If you extrapolate then number of open mathematical ... (read more)

4Pekka Puupaa
Very interesting, thanks! On a quick skim, I don't think I agree with the claim that LLMs have never done anything important. I know for a fact that they have written a lot of production code for a lot of companies, for example. And I personally have read AI texts funny or entertaining enough to reflect back on, and AI art beautiful enough to admire even a year later. (All of this is highly subjective, of course. I don't think you'd find the same examples impressive.) If you don't think any of that qualifies as important, then I think your definition of important may be overly broad. But I'll have to look at this more deeply later. I think this reasoning would also lead one to reject Moore's law as a valid way to forecast future compute prices. It is in some sense "obvious" what straight lines one should be looking at: smooth lines of technological progress. I claim that you can pick just about any capability with a sufficiently "smooth", "continuous" definition (i.e. your example of the number of open mathematical theorems solved would have to be amended to allow for partial progress and partial solutions) will tend to converge around 2027-28. Some converge earlier, some later, but that seems to be around the consensus for when we can expect human-level capability for nearly all tasks anybody's bothered to model. The Mobile Aloha website: https://mobile-aloha.github.io/ The front page has  a video of the system autonomously cooking a shrimp and other examples. It is still quite slow and clumsy, but being able to complete tasks like this at all is already light years ahead of where we were just a few years ago. Oh, I know. It's normally 5-20 years from lab to home. My 2027 prediction is for a research robot being able to do anything a human can do in an ordinary environment, not necessarily a mass-producable, inexpensive product for consumers or even most businesses. But obviously the advent of superintelligence, under my model, is going to accelerate those usu

What would your mindset have had to say about automated science in 2023, human level robots in 2024, AlphaFold curing cancer in 2025?

6Pekka Puupaa
My point is that that heuristic is not good. This obviously doesn't mean that reversing the heuristic would give you good results (reverse stupidity is not intelligence and so on). What one needs is a different set of heuristics. If you extrapolate capability graphs in the most straightforward way, you get the result that AGI should arrive around 2027-2028. Scenario analyses (like the ones produced by Kokotajlo and Aschenbrenner) tend to converge on the same result. An effective cancer cure will likely require superintelligence, so I would be expecting one around 2029 assuming alignment gets solved. We mostly solved egg frying and laundry folding last year with Aloha and Optimus, which were some of the most long-standing issues in robotics. So human level robots in 2024 would actually have been an okay prediction. Actual human level probably requires human level intelligence, so 2027.

I think I agree with Thane’s point 1: because it seems like building intelligence requires a series of conceptual insights, there may be limits to how far in advance I can know it’s about to happen (without like, already knowing how to build it out of math myself). But I don’t view this as a position of total epistemic helplessness - it’s clear that there has been a lot of progress over the last 40 years to the extent that we should be more than halfway there.

And yeah, I don’t view AGI as equivalent to other technologies - its not even clear yet what all t... (read more)

5Thane Ruthenis
Those are not incompatible. Suppose that you vaguely feel that a whole set of independent conceptual insights is missing, and that some of them will only be reachable after some previous ones have been discovered; e. g. you need to go A→B→C. Then the expected time until the problem is solved is the sum of the expected wait-times TA+TB+TC, and if you observe A and B being solved, it shortens to TC. I think that checks out intuitively. We can very roughly gauge how "mature" a field is, and therefore, how much ground there's likely to cover.
Cole WyethΩ710-2

It’s wild to me that you’ve concentrated a full 50% of your measure in the next <3 years. What if there are some aspects of intelligence which we don’t know we don’t know about yet? It’s been over ~40 years of progress since the perceptron, how do you know we’re in the last ~10% today?

5Pekka Puupaa
What would this heuristic have said about the probability of AlphaFold 2 solving protein folding in 2020? What about all the other tasks that had been untractable for decades that became solvable in the past five years? To me, 50% over the next 3 years is what sanity looks like.

Progress over the last 40 years has been not at all linear. I don't think this "last 10%" thing is the right way to think about it.

The argument you make is tempting, I must admit I feel the pull of it. But I think it proves too much. I think that you will still be able to make that argument when AGI is, in fact, 3 years away. In fact you'll still be able to make that argument when AGI is 3 months away. I think that if I consistently applied that argument, I'd end up thinking AGI was probably 5+ years away right up until the day AGI was announced.

Here's ano... (read more)

If this is an example of an LLM proving something, it's a very non-central example. It was finetuned specifically for mathematics and then used essentially as a program synthesis engine in a larger system that proved the result.

DeepMind can't just keep running this system and get more theorems out - once the engineers moved on to other projects I haven't heard anything building on the results.  

We don't want to talk about partial rationality; we want notions of rationality which bounded agents can fully satisfy.

Why expect this kind of thing to exist? It seems to me that the ideas of computational boundedness and optimality are naturally in tension. 

Answer by Cole Wyeth31

Bayes rule

Berkson's paradox

Goodhart's law

Efficient market hypothesis

I wore a suit for all of high school. At some point everyone expected me to wear a suit and if I didn't I'd never hear the end of it. The first time I wore a t-shirt there was practically a riot. Unfortunately, I got so tired of conversations regarding the fact that I was or wasn't wearing a suit that escaping to college was a massive relief - now I practically never do it. 

I don't agree with the underlying assumption then - I don't think LLMs are capable of solving difficult novel problems, unless you include a nearly-complete solution as part of the groundwork.

Can AI X-risk be effectively communicated by analogy to climate change? That is, the threat isn’t manifesting itself clearly yet, but experts tell us it will if we continue along the current path.

Though there are various disanalogies, this specific comparison seems both honest and likely to be persuasive to the left?

1Ariel Cheng
I think it would be persuasive to the left, but I'm worried that comparing AI x-risk to climate change would make it a left-wing issue to care about, which would make right-wingers automatically oppose it (upon hearing "it's like climate change"). Generally it seems difficult to make comparisons/analogies to issues that (1) people are familiar with and think are very important and (2) not already politicized.
2cdt
I agree that there is a consistent message here, and I think it is one of the most practical analogies, but I get the strong impression that tech experts do not want to be associated with environmentalists.
1CstineSublime
I'm looking at this not from a CompSci point of view by a rhetoric point of view: Isn't it much easier to make tenuous or even flat out wrong links between Climate Change and highly publicized Natural Disaster events that have lot's of dramatic, visceral footage than it is to ascribe danger to a machine that hasn't been invented yet, that we don't know the nature or inclinations of? I don't know about nowadays but for me the two main pop-culture touchstones for me for "evil AI" are Skynet in Terminator, or HAL 9000 in 2001: A Space Odyssey (and by inversion - the Butlerian Jihad in Dune). Wouldn't it be more expedient to leverage those? (Expedient - I didn't say accurate)
4MondSemmel
I don't like it. Among various issues, people already muddy the waters by erroneously calling climate change an existential risk (rather than what it was, a merely catastrophic one, before AI timelines made any worries about climate change in the year 2100 entirely irrelevant), and it's extremely partisan-coded. And you're likely to hear that any mention of AI x-risk is a distraction from the real issues, which are whatever the people cared about previously. I prefer an analogy to gain-of-function research. As in, scientists grow viruses/AIs in the lab, with promises of societal benefits, but without any commensurate acknowledgment of the risks. And you can't trust the bio/AI labs to manage these risks, e.g. even high biosafety levels can't entirely prevent outbreaks.

It seems that lesswrong was (before it became mostly AI content) essentially just a massive compilation of such intellectual life hacks. If you filter by the right tags (Rationality?) it should still be usable for this purpose. Have you determined that there is not currently a good centralized table of this kind of content?

1Celarix
Why, the Practical tag (https://www.lesswrong.com/w/practical) has a lot of cool stuff like this.
2Richard_Kennaway
The Sequences? Not quite what you're looking for, but that's what I have always thought of as the essentials of LW (before the AI explosion).
1Antoine de Scorraille
On a quick search, I couldn't find such a repo. I agree that this is/was the heart of Lesswrong, but what I'm aiming for is more specific than ‘any cool intellectual thing’. A lot of LW's posts are about contrarian viewpoints, or very particular topics. I'm looking for well-known but surprising stuff, to develop common knowledge. Should feel like: ‘OMG, everyone should know [INSERT YOUR TRICK]’

If this kind of approach to mathematics research becomes mainstream, out-competing humans working alone, that would be pretty convincing. So there is nothing that disqualifies this example - it does update me slightly.

However, this example on its own seems unconvincing for a couple of reasons:

  • it seems that the results were in fact proven by humans first, calling into question the claim that the proof insight belonged to the LLM (even though the authors try to frame it that way).
  • from the reply on X it seems that the results of the paper may not have been no
... (read more)
Cole WyethΩ470

There is a specific type of thinking, which I tried to gesture at in my original post, which I think LLMs seem to be literally incapable of. It’s possible to unpack the phrase “scientific insight” in more than one way, and some interpretations fall on either side of the line. 

5abramdemski
Yeah, that makes sense.
Cole WyethΩ494

I think the argument you’re making is that since LLMs can make eps > 0 progress, they can repeat it N times to make unbounded progress. But this is not the structure of conceptual insight as a general rule. Concretely, it fails for the architectural reasons I explained in the original post. 

3james oofou
Here's an attempt at a clearer explanation of my argument: I think the ability to autonomously find novel problems to solve will emerge as reasoning models scale up. It will emerge because it is instrumental to solving difficult problems.  Imagine an RL environment in which the LLM being trained is tasked with solving somewhat difficult open math problems (solutions verified using autonomous proof verification). It fails and fails at most of them until it learns to focus on making marginal progress: tackling simpler cases, working on tangentially-related problems, etc. These instrumental solutions are themselves often novel, meaning that the LLM will have become able to pose novel, interesting, somewhat important problems autonomously. And this will scale to something like a fully autonomous, very much superhuman researcher.  This is how it often works in humans. We work on a difficult problem, find novel results on the way there. The LLM would likely be uncertain of whether these results are truly novel but this is how it works with humans too. The system can do some DeepResearch / check with relevant experts if it's important. Of course, I'm working from my parent-comment's position that LLMs are in fact already capable of solving novel problems, just not posing them and doing the requisite groundwork.

Obviously it’s not a hard line, but your example doesn’t count, and proving any open conjecture in mathematics which was not constructed for the purpose does count. I think the quote from my post gives some other central examples. The standard is conceptual knowledge production. 

9nostalgebraist
There's a math paper by Ghrist, Gould and Lopez which was produced with a nontrivial amount of LLM assistance, as described in its Appendix A and by Ghrist in this thread (but see also this response). The LLM contributions to the paper don't seem especially impressive. The presentation is less "we used this technology in a real project because it saved us time by doing our work for us," and more "we're enthusiastic and curious about this technology and its future potential, and we used it in a real project because we're enthusiasts who use it in whatever we do and/or because we wanted learn more about its current strengths and weaknesses." And I imagine it doesn't "count" for your purposes. But – assuming that this work doesn't count – I'd be interested to hear more about why it doesn't count, and how far away it is from the line, and what exactly the disqualifying features are. Reading the appendix and Ghrist's thread, it doesn't sound like the main limitation of the LLMs here was an inability to think up new ideas (while being comparatively good at routine calculations using standard methods).  If anything, the opposite is true: the main contributions credited to the LLMs were... 1. Coming up with an interesting conjecture 2. Finding a "clearer and more elegant" proof of the conjecture than the one the human authors had devised themselves (and doing so from scratch, without having seen the human-written proof) ...while, on the other hand, the LLMs often wrote proofs that were just plain wrong, and the proof in (2) was manually selected from amongst a lot of dross by the human authors. ---------------------------------------- To be more explicit, I think that that the (human) process of "generating novel insights" in math often involves a lot of work that resembles brute-force or evolutionary search. E.g. you ask yourself something like "how could I generalize this?", think up 5 half-baked ideas that feel like generalizations, think about each one more ca

Great post!

We in universal AI already have a name for the generalization of AIXI that may use a different history distribution: AI\mu. Since you intend \mu to be a distribution over hypotheses, NOT the true environment as Marcus Hutter usually uses \mu, I would also preferably replace \mu -> \nu. This may seem pedantic but it's kind of triggering to use AIXI to refer to AI\nu, because the "XI" part of "AIXI" is (sometimes) intended to translate as \xi, Hutter's chronological version of the universal distribution. 

Cole WyethΩ120

I think it’s the latter.

Cole WyethΩ511-1

It seems suspicious to me that this hype is coming from fields were it seems hard to verify (is the LLM actually coming up with original ideas or is it just fusing standard procedures? Are the ideas the bottleneck or is the experimental time the bottleneck? Are the ideas actually working or do they just sound impressive?). And of course this is Twitter. 

Why not progress on hard (or even easy but open) math problems? Are LLMs afraid of proof verifiers? On the contrary, it seems like this is the area where we should be able to best apply RL, since there is a clear reward signal. 

5ShardPhoenix
That's a reasonable suspicion but as a counterpoint there might be more low-hanging fruit in biomedicine than math, precisely because it's harder to test ideas in the former. Without the need for expensive experiments, math has already been driven much deeper than other fields, and therefore requires a deeper understanding to have any hope of making novel progress. edit: Also, if I recall correctly, the average IQ of mathematicians is higher than biologists, which is consistent with it being harder to make progress in math.
Thane RuthenisΩ111915

On the contrary, it seems like this is the area where we should be able to best apply RL, since there is a clear reward signal. 

Is there? It's one thing to verify whether a proof is correct; whether an expression (posed by a human!) is tautologous to a different expression (also posed by a human!). But what's the ground-truth signal for "the framework of Bayesian probability/category theory is genuinely practically useful"?

This is the reason I'm bearish on the reasoning models even for math. The realistic benefits of them seem to be:

  1. Much faster feedba
... (read more)
Cole WyethΩ210-2

Yeah, I agree with this. If you feed an LLM enough hints about the solution you believe is right, and it generates ten solutions, one of them will sound to you like the right solution.

abramdemskiΩ9252

For me, this is significantly different from the position I understood you to be taking. My push-back was essentially the same as 

"has there been, across the world and throughout the years, a nonzero number of scientific insights generated by LLMs?" (obviously yes),

& I created the question to see if we could substantiate the "yes" here with evidence. 

It makes somewhat more sense to me for your timeline crux to be "can we do this reliably" as opposed to "has this literally ever happened" -- but the claim in your post was quite explicit about t... (read more)

Load More