LESSWRONG
LW

All of 1stuserhere's Comments + Replies

A Rocket–Interpretability Analogy

1stuserhere5mo10

on the one hand, mechanistic understanding has historically underperformed as a research strategy,

Are you talking about ML or in general? What are you deriving this from?

4leogao5mo

For ML, yes. I'm deriving this from the bitter lesson.

"Slow" takeoff is a terrible term for "maybe even faster takeoff, actually"

1stuserhere6mo*87

I think it’s even more actively confusing because “smooth/continuous” takeoff not only could be faster in calendar time

We're talking about two different things here: take-off velocity, and timelines. All 4 possibilities are on the table - slow takeoff/long timelines, fast takeoff/long timelines, slow takeoff/short timelines, fast takeoff/short timelines.

A smooth takeoff might actually take longer in calendar time if incremental progress doesn’t lead to exponential gains until later stages.

Honestly I'm surprised people are conflating timelines and takeoff speeds.

7Richard Korzekwa 6mo

I agree. I look at the red/blue/purple curves and I think "obviously the red curve is slower than the blue curve", because it is not as steep and neither is its derivative. The purple curve is later than the red curve, but it is not slower. If we were talking about driving from LA to NY starting on Monday vs flying there on Friday, I think it would be weird to say that flying is slower because you get there later. I guess maybe it's more like when people say "the pizza will get here faster if we order it now"? So "get here faster" means "get here sooner"? Of course, if people are routinely confused by fast/slow, I am on board with using different terminology, but I'm a little worried that there's an underlying problem where people are confused about the referents, and using different words won't help much.

Are the majority of your ancestors farmers or non-farmers?

1stuserhere7mo42

That's interesting. On the recent episode of Dwarkesh Podcast with David Reich, at 1:18:00, there's a discussion I'll quote here:

There was a super interesting series of papers. They made many things clear but one of them was that actually the proportion of non-Africans ancestors who are Neanderthals is not 2%.

That’s the proportion of their DNA in our genomes today if you're a non-African person. It's more like 10-20% of your ancestors are Neanderthals. What actually happened was that when Neanderthals and modern humans met and mixed, the Neanderthal D

... (read more)

what becoming more secure did for me

1stuserhere7mo30

Good post, thanks for sharing! found it somewhat relatable to my prior life experiences too

The ‘strong’ feature hypothesis could be wrong

1stuserhere8mo20

Great essay!

I found it to be well written and articulates many of my own arguments in casual conversations well. I'll write up a longer comment with some points I found interesting and concrete questions accompanying them sometime later.

Can We Predict Persuasiveness Better Than Anthropic?

1stuserhere8mo20

For each of the resulting 1313 arguments, crowdworkers were first asked to rate their support of the corresponding claim on a Likert scale from 1 (“Strongly Oppose”) to 7 (“Strongly Oppose”).

You probably mean strongly oppose to strongly support

1Lennart Finke8mo

Thanks, fixed!

Daniel Tan's Shortform

1stuserhere8mo10

You'll enjoy reading What Causes Polysemanticity? An Alternative Origin Story of Mixed Selectivity from Incidental Causes (link to the paper)

Using a combination of theory and experiments, we show that incidental polysemanticity can arise due to multiple reasons including regularization and neural noise; this incidental polysemanticity occurs because random initialization can, by chance alone, initially assign multiple features to the same neuron, and the training dynamics then strengthen such overlap.

Daniel Tan's Shortform

1stuserhere8mo30

If we train several SAEs from scratch on the same set of model activations, are they “equivalent”?

For SAEs of different sizes, for most layers, the smaller SAE does contain very high similarity with some of the larger SAE features, but it's not always true. I'm working on an upcoming post on this.

2Bart Bussmann8mo

Interesting, we find that all features in a smaller SAE have a feature in a larger SAE with cosine similarity > 0.7, but not all features in a larger SAE have a close relative in a smaller SAE (but about ~65% do have a close equavalent at 2x scale up).

How to accelerate recovery from sleep debt with biohacking?

1stuserhere1y40

This is purely anecdotal - supplementing sleep debt with cardio-intensive exercise works for me. For example, I usually need 7 hrs of sleep. If I sleep for only 5 hrs, I'm likely to feel a drop in mental sharpness around midway the next day. However, if I go for an hour long run, I miss that drop almost completely and feel just as good I normally would've with a complete sleep.

1stuserhere1y20

It's also worth noting that LLMs are not learning directly from the raw input stream but from a crux of that data (LLMs learn on compressed data) i.e. the LLMs are fed tokenized data, and the tokenizers act as compressors. This benefits the models by enabling them to have a more information-rich context.

1Fergus Fettes1y

Would you say that tokenization is part of the architecture? And, in your wildest moments, would you say that language is also part of the architecture :)? I mean the latent space is probably mapping either a) brain states or b) world states right? Is everything between latent spaces architecture?

Are Mixture-of-Experts Transformers More Interpretable Than Dense Transformers?

1stuserhere2y10

I think that the answer is no

In this “VRAM-constrained regime,” MoE models (trained from scratch) are nowhere near competitive with dense LLMs.

Curious whether your high-level thoughts on these topics still hold or have changed.

How to Think About Activation Patching

1stuserhere2y10

On a more narrow distribution this head could easily exhibit just one behaviour and eg seem like a monosemantic inductin head

induction* head

What 2026 looks like

1stuserhere2y80

The 2023 predictions seem to hold up really well, so far, especially the SDM in interactive environment one, image synthesis, passing the bar exam, legal NLP systems, enthusiasm of programmers, and Elon Musk re-entering the space of building AI systems.

How to Read Papers Efficiently: Fast-then-Slow Three pass method

1stuserhere2y42

Interesting perspective especially your comments on citations. Agreed with the diagrams/figures/tables being some of the most interesting parts of the paper, but I also try to find the problem that motivated the authors (which is frequently embedded better in the introduction imo than the abstract).

2Mary Chernyshenko2y

Yes, in my experience abstracts are results-oriented, not problem-oriented. I do like introductions, too) they are often written so generally that I fail to identify the problem. But what a nice feeling of understanding) The break between the intro and the specific problem they attacked can be really jarring. Overall, we read it for what it is, not for what it promised to be.

AI alignment researchers don't (seem to) stack

1stuserhere2y31

In this analogy, the trouble is, we do not know whether we're building tunnels in parallel (same direction) or the opposite, or zig zag. The reason for that is a lack of clarity about what will turn out to be a fundamentally important approach towards building a safe AGI. So, it seems to me that for now, exploration for different approaches might be a good thing and the next generation of researchers does less digging and is able to stack more on the existing work

AI alignment researchers don't (seem to) stack

1stuserhere2y41

I agree. It seems like striking a balance between exploration and exploitation. We're barely entering the 2nd generation of alignment researchers. It's important to generate new directions of approaching the problem especially at this stage, so that we have a better chance of covering more of the space of possible solutions before deciding to go in deeper. The barrier to entry also remains slightly lower in this case for new researchers. When some research directions "outcompete" other directions, we'll naturally see more interest in those promising directions and subsequently more exploitation, and researchers will be stacking.