Thane Ruthenis - LessWrong

Yeah, but this case isn't even an interesting kind of memory failure. It's just running up against a hard barrier. I'm not even convinced they wouldn't be able to do this paper's tasks if you just gave them a basic memory scaffold, like Claude playing Pokemon has.

This "failure" would be demonstrated just as well by asking them to multiply two 10^4-digit numbe–

Wait, that literally was the previous "LLMs can't follow algorithms" fad! And like with that one, I expect that part of the reason LLMs fail at the new paper's tasks is because their training environments never incentivized them to do tons of boring menial operations, and to instead look for clever workarounds.

Give Me a Reason(ing Model)

Thane Ruthenis2d711

As someone who is famously bearish on LLMs, I find this paper completely unconvincing as a reason to be bearish on LLMs.

Every few months, we get both a "this paper shows fundamental LLM limitations!!!" and a "this paper shows LLMs producing innovations!!!", and both always turn out to be slop. Tiresome.

Mikhail Samin's Shortform

Thane Ruthenis2d177

It is not clear to me that Anthropic "unilaterally stopping" will result in meaningfully better outcomes than the status quo

I think that just Anthropic, OpenAI, and DeepMind stopping would plausibly result in meaningfully better outcomes than the status quo. I still see no strong evidence that anyone outside these labs is actually pursuing AGI with anything like their level of effectiveness. I think it's very plausible that everyone else is either LARPing (random LLM startups), or largely following their lead (DeepSeek/China), or pursuing dead ends (Meta's LeCun), or some combination.

The o1 release is a good example. Yes, everyone and their grandmother was absent-mindedly thinking about RL-on-CoTs and tinkering with relevant experiments. But it took OpenAI deploying a flashy proof-of-concept for everyone to pour vast resources into this paradigm. In the counterfactual where the three major labs weren't there, how long would it have taken the rest to get there?

I think it's plausible that if only those three actors stopped, we'd get +5-10 years to the timelines just from that. Which I expect does meaningfully improve the outcomes, particularly in AI-2027-style short-timeline worlds.

So I think getting any one of them to individually stop would be pretty significant, actually (inasmuch as it's a step towards "make all three stop").

Natural Latents: The Concepts

Thane Ruthenis2d*Ω120

Cool. I've had the same idea, that we want something like "synergistic information present in each random subset of the system's constituents", and yeah, it doesn't work out-of-the-box.

Some other issues there:

If we're actually sampling random individual atoms all around the dog's body, it seems to me that we'd need an incredibly large amount of them to decode anything useful. Much fewer than if we were sampling random small connected chunks of atoms.
- More intuitive example: Suppose we want to infer a book's topic. What's the smallest such that we can likely infer the topic from a random string of length $N$ ? Comparatively, what's the smallest $M$ such that we can infer it from $M$ letters randomly and independently sampled from the book's text? It seems to me that $N ≪ M$ .
But introducing "chunks of nearby variables" requires figuring out what "nearby" is, i. e., defining some topology for the low-level representation. How does that work?
Further, the size of the chunk needed depends a lot on which part of the system we sample, so just going "a flat % of all constituents" doesn't work. Consider happening to land on a DNA string vs. some random part of the interior of the dog's stomach.
- Actually, dogs are kind of a bad example, animals do have DNA signatures spread all around them. A complex robot, then. If we have a diverse variety of robots, inferring the specific type is easy if we sample e. g. part of the hardware implementing its long-term memory, but not if we sample a random part of an appendage.
- Or a random passage from the book vs. the titles of the book's chapters. Or even just "a sample of a particularly info-dense paragraph" vs. "a sample from an unrelated anecdote from the author's life". % of the total letter count just doesn't seem like the right notion of "smallness".
On the flip side, sometimes it's reversed: sometimes we do want to sample random unconnected atoms. E. g., the nanomachine example: if we happen to sample the "chunk" corresponding to appendage#12, we risk learning nothing about the high-level state, whereas if we sample three random atoms from different parts of it, that might determine the high-level state uniquely. So now the desired topology of the samples is different: we want non-connected chunks.

I'm currently thinking this is solved by abstraction hierarchies. Like, maybe the basic definition of an abstraction is of the "redundant synergistic variable" type, and the lowest-level abstractions are defined over the lowest-level elements (molecules over atoms). But then higher-level abstractions are redundant-synergistic over lower-level abstractions (rather than actual lowest-level elements), and up it goes. The definitions of the lower-level abstractions provide the topology + sizing + symmetries, which higher-level abstractions then hook up to. (Note that this forces us to actually step through the levels, either bottom-up or top-down.)

As examples:

The states of the nanomachines' modules are inferable from any subset of the modules' constituent atoms, and the state of the nanomachine itself is inferable from the states of any subset of the modules. But there's no such neat relationships between atoms and the high-level state.
"A carbon atom" is synergistic information about a chunk of voxels (baking-in how that chunk could vary, e. g. rotations, spatial translations); "a DNA molecule" is synergistic information about a bunch of atoms (likewise defining custom symmetries under which atom-compositions still count as a DNA molecule); "skin tissue" is synergistic over molecules; and somewhere up there we have "a dog" synergistic over custom-defined animal-parts.

Or something vaguely like that; this doesn't exactly work either. I'll have more to say about this once I finish distilling my notes for external consumption instead of expanding them, which is going to happen any... day... now...

When is it important that open-weight models aren't released? My thoughts on the benefits and dangers of open-weight models in response to developments in CBRN capabilities.

Thane Ruthenis3dΩ130

I very tentatively agree with that.

I'd guess it's somewhat unlikely that large AI companies or governments would want to continue releasing models with open weights once they are this capable, though underestimating capabilities is possible

I think that's a real concern, though. I think the central route by which going open-source at the current capability level leads to extinction is a powerful AI model successfully sandbagging during internal evals (which seems pretty easy for an actually dangerous model to do, given evals' current state), getting open-sourced, and things then going the "rogue replication" route.

Richard Ngo's Shortform

Thane Ruthenis3d40

For steps 2-4, I kinda expect current neural nets to be kludgy messes, and so not really have the nice subagent structure

If, as a system comes to ever-better approximate a powerful agent, there's actual convergence towards this type of hierarchical structure, I expect you'd see something clearly distinct from noise even in the current LLMs. Intuition pump/proof-of-concept: the diagram comparing a theoretical prediction with empirical observations here.

Indeed, I think it's one of the main promises of "top-down" agent foundations research. The applicability and power of the correct theory of powerful agents' internal structures, whatever that theory may be, would scale with the capabilities of the AI system under study. It'll apply to LLMs inasmuch as LLMs are actually on trajectory to be a threat, and if we jump paradigms to something more powerful, the theory would start working better (as opposed to bottom-up MechInterp techniques, which would start working worse).

Natural Latents: The Concepts

Thane Ruthenis3d*Ω371

I'm possibly missing something basic here, but: how is the redund/latent-focused natural-abstraction theory supposed to deal with synergistic information (and "emergent" dynamics)?

Consider a dog at the level of atoms. It's not, actually, the case that "this is a dog" is redundantly encoded in each atom. Even if each atom were clearly labeled, and we had an explicit approximately deterministic function, the state of any individual atom would constrain the output not at all. Atom#2354 being in a state #7532 is consistent with its comprising either a dog, or a cat, or an elephant...

This only stops applying if we consider macroscopically sized chunks of atoms, or the specific set of microscopically sized chunks corresponding to DNA.

And even that doesn't always work. Consider a precision-engineered nanomachine, with each atom accounted for. Intuitively, "the nanomachine's state" should be an abstraction over those atoms. However, there's not necessarily any comparatively miniscule "chunk" of the nanomachine that actually redundantly encodes its state! E. g., a given exact position of appendage#12 may be consistent either with resource-extraction or with rapid travel.

So: Suppose we have some set of random variables $X$ representing some cube of voxels where each voxel reports what atoms are in it. Imagine a dataset of various animals (or nanomachines) in this format, of various breeds and in various positions.

"This is a dog" tells us some information about $X$ : $H (X | dog) < H (X)$ . Indeed, it tells us a fairly rich amount of information: the general "shape" of what we should expect to see there. However, for any individual $X_{i}$ , $H (X_{i} | dog) \approx H (X_{i})$ .^[1] Which is to say: "this is a dog" is synergistic information about $X$ ! Not redundant information. And symmetrically, sampling a given small chunk of $X$ won't necessarily tell us whether it's the snapshot of a dog or a cat (unless we happen to sample a DNA fragment). $H (animal | X) = 0$ , but $H (animal | X_{i}) \approx H (animal)$ .

One way around this is to suggest that cats/dogs/nanomachines aren't abstractions over their constituent parts, but abstractions over the resampling of all their constituent parts under state transitions. I. e., suppose we now have 3D video recordings: then "this is a dog" is redundantly encoded in each $X (t)$ for $t \in [t_{start}, t_{end}]$ .

But that seems counterintuitive/underambitious. Intuitively, tons of abstractions are about robust synergistic information/emergent dynamics.

Is there some obvious way around all that, or it's currently an open question?

^{^}
Though it's not literally zero. E. g., if we have a fixed-size voxel cube, then depending on whether it's a dog or an elephant, we should expect the voxels at the edges to be more or less likely to contain air vs. flesh.

Dwarkesh Patel on Continual Learning

Thane Ruthenis3d131

There is no consensus definition of transformational but I think this is simply wrong, in the sense that LLMs being stuck without continual learning at essentially current levels would not stop them from having a transformational impact

IMO, if LLMs get stuck at the current level of incapabilities, they'd be 7-7.5 on the Technological Richter Scale. (Maybe an 8, but I think that's paying too much attention to how impressive-in-themselves they are, and failing to correctly evaluate the counterfactual real-world value they actually add.) That doesn't cross my threshold for "transformative" in the context of AI.

Yes, it's all kinds of Big Deal if we're operating on mundane-world logic. But when the reference point is the Singularity, it's just not that much of a big deal.

johnswentworth's Shortform

Thane Ruthenis4d60

I'm not quite sure why it feels like unstructured play makes people better/stronger

(Written before reading the second part of the OP.)

I don't really share that feeling^[1]. But if I conditioned on that being true and then produced an answer:

Obviously because it trains research taste.

Or, well, the skills in that cluster. If you're free to invent/modify the rules of the game at any point, then if you're to have fun, you need to be good at figuring out what rules would improve the experience for you/everyone, and what ideas would detract from it. You're simultaneously acting as a designer and as a player. And there's also the element of training your common-sense/world-modeling skills: what games would turn out fun and safe in the real world, and which ones seem fun in your imagination, but would end up boring due to messy realities or result in bodily harm.

By contrast, structured play enforces a paradigm upon you and only asks you to problem-solve within it. It trains domain-specific skills, whereas unstructured play is "interdisciplinary", in that you can integrate anything in your reach into it.

More broadly: when choosing between different unstructured plays, you're navigating a very-high-dimensional space of possible games, and (1) that means there's simply a richer diversity of possible games you can engage in, which means a richer diversity of skills you can learn, (2) getting good at navigating that space is a useful skill in itself. Structured plays, on the other hand, present for choice a discrete set of options pre-computed to you by others.

Unstructured play would also be more taxing on real-time fluid-intelligence problem-solving. Inferring the rules (if they've been introduced/changed by someone else), figuring out how to navigate them on the spot, etc.

Which factor is most important for growing better/stronger?

What's the sense of "growing better/stronger" you're using here? Fleshing that out might make the answer obvious.

^{^}
Not in the sense that I think this statement is wrong, but in that I don't have the intuition that it's true.

Thane Ruthenis's Shortform

Thane Ruthenis5d90

I mean, if you interpret "land" in a Georgist sense, as the sum of all natural resources of the reachable universe, then yes, it's finite. And the fights for carving up that pie can start long before our grabby-alien hands have seized all of it. (The property rights to the Andromeda Galaxy can be up for sale long before our Von Neumann probes reach it.)

LESSWRONG
LW

Posts

Wikitag Contributions

Comments