All of Dennis Zoeller's Comments + Replies

That’s a fantastic memory aid for this concept, much appreciated! Crafting games in general give ample examples to internalize this kind of bootstrap mentality. Also for quickly scaling to the next anvil-equivalent. As you touched upon, real life has a deep crafting tree, with anvil problems upon anvil problems. Something that took me far too long to learn, if you got your anvil, but still don't find yourself were you want to be, it pays to find the next anvil problem quickly. If you still have a lot of distance to cover, don't get bogged down by things th... (read more)

Hey, thanks for taking the time to answer!

First, I want to make clear that I don’t believe LLMs to be just stochastic parrots, nor do I doubt that they are capable of world modeling. And you are right to request some more specifically stated beliefs and predictions. In this comment, I attempted to improve on this, with limited success.

There are two main pillars in my world model that make me, even in light of the massive gains in capabilities we have seen in the last seven year, still skeptical of transformer architecture scaling straight to AGI.

  1. Compute ov
... (read more)
2eggsyntax
[EDIT: I originally gave an excessively long and detailed response to your predictions. That version is preserved (& commentable) here in case it's of interest] I applaud your willingness to give predictions! Some of them seem useful but others don't differ from what the opposing view would predict. Specifically: 1. I think most people would agree that there are blind spots; LLMs have and will continue to have a different balance of strengths and weaknesses from humans. You seem to say that those blind spots will block capability gains in general; that seems unlikely to me (and it would shift me toward your view if it clearly happened) although I agree they could get in the way of certain specific capability gains. 2. The need for escalating compute seems like it'll happen either way, so I don't think this prediction provides evidence on your view vs the other. 3. Transformers not being the main cognitive component of scaffolded systems seems like a good prediction. I expect that to happen for some systems regardless, but I expect LLMs to be the cognitive core for most, until a substantially better architecture is found, and it will shift me a bit toward your view if that isn't the case. I do think we'll eventually see such an architectural breakthrough regardless of whether your view is correct, so I think that seeing a breakthrough won't provide useful evidence. 4. 'LLM-centric systems can't do novel ML research' seems like a valuable prediction; if it proves true, that would shift me toward your view.
2eggsyntax
First of all, serious points for making predictions! And thanks for the thoughtful response. Before I address specific points: I've been working on a research project that's intended to help resolve the debate about LLMs and general reasoning. If you have a chance to take a look, I'd be very interested to hear whether you would find the results of the proposed experiment compelling; if not, then why not, and are there changes that could be made that would make it provide evidence you'd find more compelling? Absolutely! And then on top of that, it's very easy to mistake using knowledge from the truly vast training data for actual reasoning. This does seem like one possible outcome. That said, it seems more likely to me that continued algorithmic improvements will result in better sample efficiency (certainly humans need a far tinier amount of language examples to learn language), and multimodal data /synthetic data / self-play / simulated environments continue to improve. I suspect capabilities researchers would have made more progress on all those fronts, had it not been the case that up to now it was easy to throw more data at the models. In the past couple of weeks lots of people have been saying the scaling labs have hit the data wall, because of rumors of slowdowns in capabilities improvements. But before that, I was hearing at least some people in those labs saying that they expected to wring another 0.5 - 1 order of magnitude of human-generated training data out of what they had access to, and that still seems very plausible to me (although that would basically be the generation of GPT-5 and peer models; it seems likely to me that the generation past that will require progress on one or more of the fronts I named above). I think that's a reasonable concern in the general case. But in cases like the ones mentioned, the authors are retrieving information (eg lat/long) using only linear probes. I don't know how familiar you are with the math there, but if so

The performance of o1 in the first linked paper is indeed impressive, especially on what they call mystery blocksworld. Would not have expected this level of improvement. Do you know of any material that goes into more detail on the RL pre-training of o1?

I do take issue with the conclusion that reasoning in the confines of toy problems is sufficient to scale directly to AGI, though. The disagreement might stem from differing definitions of AGI. LLMs (or LRMs) exist in an environment provided by humans, including the means to translate LLM output into "acti... (read more)

2eggsyntax
As far as I know OpenAI has been pretty cagey about how o1 was trained, but there seems to be a general belief that they took the approach they had described in 2023 in 'Improving mathematical reasoning with process supervision' (although I wouldn't think of that as pre-training). I can at least gesture at some of what's shaping my model here: * Roughly paraphrasing Ilya Sutskever (and Yudkowsky): in order to fully predict text, you have to understand the causal processes that created it; this includes human minds and the physical world that they live in. * The same strategy of self-supervised token-prediction seems to work quite well to extend language models to multimodal abilities up to and including generating video that shows an understanding of physics. I'm told that it's doing pretty well for robots too, although I haven't followed that literature. * We know that models which only see text nonetheless build internal world models like globes and  game boards. * Proponents of the view that LLMs are just applying shallow statistical patterns to the regularities of language have made predictions based on that view that have failed repeatedly, such as the claim that no pure LLM would ever able to correctly complete Three plus five equals. Over and over we've seen predictions about what LLMs would never be able to do turn out to be false, usually not long thereafter (including the ones I mention in my post here). At a certain point that view just stops seeming very plausible. I think your intuition here is one that's widely shared (and certainly seemed plausible to me for a while). But when we cash that out into concrete claims, those don't seem to hold up very well. If you have some ideas about specific limitations that LLMs couldn't overcome based on that intuition (ideally ones that we can get an answer to in the relatively near future), I'd be interested to hear them.

I'd put a reasonably high probability (5%) on orcas and several other species having all the necessary raw mental capacity to be "uplifted" in just a few (<20) generations with technology (in the wider sense) that has been available for a long time. Being uplifted means here the ability to intellectually engage with us on a near-equal or even equal footing, to create culture, to actively shape their destiny. Humans have been training, selecting, shaping other animals since before the dawn of history. Whenever we did so, it was with the goal of improving... (read more)

Then I misunderstood your original comment, sorry. As a different commenter wrote, the obvious solution would be to only engage with interesting people. But, of course, unworkable in practice. And "social grooming" nearly always involves some level of talking. A curse of our language abilities, I guess. Other social animals don't have that particular problem.

The next best solution would be higher efficiency, more socializing bang for your word count buck, so to speak. Shorter conversations for the same social effect. Not usually a focus of anything billed as conversation guide, for obvious reasons. But there are some methods aimed at different goals that, in my experience, also help with this as a side effect.

I understand, for someone with a strong drive to solve hard problems, there's an urge for conversations to serve a function, exchange information with your interlocutor so things can get done. There's much to do and communication is already painfully inefficient at it's best.

The thing is, I don't think the free-association game is inefficient, if one is skilled at it. It's also not all that free. The reason it is something humans "developed" is because it is the most efficient way to exchange rough but extensive models of our minds with others via natural ... (read more)

Part of the problem is that the very large majority of people I run into have minds which fall into a relatively low-dimensional set and can be "ray traced" with fairly little effort. It's especially bad in EA circles.