All of Rudi's Comments + Replies

Rudi12

Thanks, that makes sense! I did not fully realize that the phrase in the terms is really just "improve any other large language model", which is indeed so vague/general that it could be interpreted to include almost any activity that would entail using Llama-2 in conjunction with other models.

RudiΩ010

Thanks a lot for the context!

Out of curiosity, why does the model training restriction make it much less useful for safety research?

9Zac Hatfield-Dodds
Example projects you're not allowed to do, if they involve other model families: * using Llama 2 as part of an RLAIF setup, which you might want to do when investigating Constitutional AI or decomposition or faithfulness of chain-of-thought or many many other projects; * using Llama 2 in auto-interpretability schemes to e.g. label detected features in smaller models, if this will lead to improvements in non-Llama-2 models; * fine-tuning other or smaller models on synthetic data produced by Llama 2, which has some downsides but is a great way to check for signs of life of a proposed technique In many cases I expect that individuals will go ahead and do this anyway, much like the license of Llama 1 was flagrantly violated all over the place, but remember that it's differentially risky for any organisation which Meta might like to legally harass.
Rudi22

Thanks for putting this so succintly! To add another subjective data point, I had very similar thoughts immediately after I first saw this work (and the more conceptual follow-up by Chugtai et al) a few months ago.

About "one-hotting being a significant transformation": I have a somewhat opposite intuition here and would say that this is also quite natural.

Maybe at first glance one would find it more intuitive to represent the inputs as a subset of the real numbers (or floats, I guess) and think of modular addition as some garbled version of the... (read more)