All of danwil's Comments + Replies

danwil110

In entrepreneurship, there is the phrase "ideas are worthless". This is because everyone already has lots of ideas they believe are promising. Hence, a pre-business idea is unlikely to be stolen.

Similarly, every LLM researcher already has a backlog of intriguing hypotheses paired with evidence. So an outside idea would have to seem more promising than the backlog. Likely this will require the proposer to prove something beyond evidence.

For example, Krizhevsky/Sutskever/Hinton had the idea of applying then-antiquated neural nets to recognize images. Only wh... (read more)

danwil43

If an outsider's objective is to be taken seriously, they should write papers and submit them to peer review (e.g. conferences and journals).

Yann LeCun has gone so far to say that independent work only counts as "science" if submitted to peer review:

"Without peer review and reproducibility, chances are your methodology was flawed and you fooled yourself into thinking you did something great." - https://x.com/ylecun/status/1795589846771147018?s=19.

From my experience, professors are very open to discuss ideas and their work with anyone who seems serious, int... (read more)

4Viliam
I agree, but there are two different perspectives: * whether the outsider wants to be taken seriously by academia * whether the people in academia want to collect knowledge efficiently From the first perspective, of course, if you want to be taken seriously, you need to play by their rules. And if you don't, then... those are your revealed preferences, I guess. It is the second perspective I was concerned about. I agree that the outsiders are often wrong. But, consider the tweet you linked: It seems to me that from the perspective of a researcher, taking ideas of the outsiders who have already developed successful products based on them, and examining them scientifically (and maybe rejecting them afterwards), should be a low-hanging fruit. I am not suggesting to treat the ideas of the outsiders as scientific. I am suggesting to treat them as "hypotheses worth examining". Refusing to even look at a hypothesis because it is not scientifically proven yet, that's putting the cart before the horse. Hypotheses are considered first, scientifically proved later; not the other way round. All scientific theories were non-scientific hypotheses first, at the moment they were conceived. Choosing the right hypothesis to examine, is an art. Not a science yet; that is what it becomes after we examine it. In theory, any (falsifiable) hypothesis could be examined scientifically, and afterwards confirmed or rejected. In practice, testing completely random hypotheses would be a waste of time; they are 99.9999% likely to be wrong, and if you don't find at least one that is right, your scientific career is over. (You won't become famous by e.g. testing million random objects and scientifically confirming that none of them defies gravity. Well, you probably would become famous actually, but in the bad way.) From the Bayesian perspective, what you need to do is test hypotheses that have a non-negligible prior probability of being correct. From the perspective of the truth-seeker,
danwil32

Double thanks for the extended discussion and ideas! Also interested to see what happens.

We earlier created some SAEs that completely remove the unigram directions from the encoder (e.g. old/gpt2_resid_pre_8_t8.pt). 

However, a " Golden Gate Bridge" feature individually activates on " Golden" (plus prior context), " Gate" (plus prior), and " Bridge" (plus prior). Without the last-token/unigram directions these tended not to activate directly, complicating interpretability.

danwil*30

Yes to both! We varied expansion size for tokenized (8x-32x) and baseline (4x-64x), available in the Google Drive folder expansion-sweep. Just to be clear, our focus was on learning so-called "complex" features that do not solely activate based on the last token. So, we did not use the lookup biases as additional features (only for decoder reconstruction). 

That said, ~25% of the suggested 64x baseline features are similar to the token-biases (cosine similarity 0.4-0.9). In fact, evolving the token-biases via training substantially increases their simi... (read more)

3Logan Riggs
Do you have a dictionary-size to CE-added plot? (fixing L0) I agree it's not like the other features in that the encoder isn't used, but it is used for reconstruction which affects CE. It'd be good to show the pareto improvement of CE/L0 is not caused by just having an additional vocab_size number of features (although that might mean having to use auxk to have a similar number of alive features).
danwil*10

Additionally:

  • We recommend using the gpt2-layers directory, which includes resid_pre layers 5-11, topk=30, 12288 features (the tokenized "t" ones have learned lookup tables, pre-initialized with unigram residuals).
  • The folders pareto-sweep, init-sweep, and expansion-sweep contain parameter sweeps, with lookup tables fixed to 2x unigram residuals.

In addition to the code repo linked above, for now here is some quick code that loads the SAE, exposes the lookup table, and computes activations only:

import torch

SAE_BASE_PATH = 'gpt2-layers/gpt2_resid_pre'       
... (read more)
danwil*30

Indeed, similar tokens (e.g. "The" vs. " The") have similar token-biases (median max-cos-sim 0.89). This is likely because we initialize the biases with the unigram residuals, which mostly retain the same "nearby tokens" as the original embeddings.

Due to this, most token-biases have high cosine similarity to their respective unigram residuals (median 0.91). This indicates that if we use the token-biases as additional features, we can interpret the activations relative to the unigram residuals (as a somewhat un-normalized similarity, due to the dot product)... (read more)