Thanks a lot for the context!
Out of curiosity, why does the model training restriction make it much less useful for safety research?
Thanks for putting this so succintly! To add another subjective data point, I had very similar thoughts immediately after I first saw this work (and the more conceptual follow-up by Chugtai et al) a few months ago.
About "one-hotting being a significant transformation": I have a somewhat opposite intuition here and would say that this is also quite natural.
Maybe at first glance one would find it more intuitive to represent the inputs as a subset of the real numbers (or floats, I guess) and think of modular addition as some garbled version of the...
Thanks, that makes sense! I did not fully realize that the phrase in the terms is really just "improve any other large language model", which is indeed so vague/general that it could be interpreted to include almost any activity that would entail using Llama-2 in conjunction with other models.