Which ML skills are useful for finding a new AIS research agenda?

Yonatan Cale

TL;DR: I'm trying to either come up with a new promising AIS direction or decide (based on my inside view and not based on trust) that I strongly believe in one of the existing proposals. Is there some ML background that I better get? (and if possible: why do you think so?)

I am not asking how to be employable

I know there are other resources on that, and I'm not currently trying to be employed.

Examples of seemingly useful things I learned so far (I want more of these)

GPT's training involves loss only for the next token (relevant for myopia)
Neural networks have a TON of parameters (relevant for why interpretability is hard)
GPT has a limited amount of tokens in the prompt, plus other problems that I can imagine^[1] trying to solve (relevant for solutions like "chain of thought")
The vague sense that nobody has a good idea for why things work, and the experience of trying different hyperparameters and learning only in retrospect which of them did well.

(Please correct me if something here was wrong)

I'm asking because I'm trying to decide what to learn next in ML

if anything.

My background

Almost no ML/math background: I spent about 13 days catching up on both. What I did so far is something like "build and train GPT2 (plus some of pyTorch)", and learn about some other architectures.
I have a lot of non-ML software engineering experience.

Thanks!

^{^}
I don't intend to advance ML capabilities

I think that rather than ML engineering (recreating GPT, learning PyTorch, etc.) it's more effective for an AI safety researcher to learn one or several general theories of ML, deep learning, or specifically transformers, such as:

Balestriero’s spline theory of deep learning (2018) and the geometry of deep networks (2019)
Olah et al.’s theory of circuits (2020)
Roberts, Yaida, and Hanin’s deep learning theory (2021)
Vanchurin’s theory of machine learning (2021)
Anthropic’s mathematical framework for transformers (2021)
Boyd, Crutchfield, and Gu’s theory of thermodynamic machine learning (2022)
Marciano’s theory of DNNs as a semi-classical limit of topological quantum NNs (2022)
Bahri et al.’s review of statistical mechanics of deep learning (2022)
Alfarra et al.’s tropical geometry perspective on decision boundaries of NNs (2022)

I've personally learned (well, at least, read the corresponding paper in full, making sure that I understand or "almost" understand every part of it) from the list above: the circuit theory (Olah et al. 2020) and the mathematical framework for transformers (Elhage et al. 2021). However, this is a very "low variance" choice: if AI safety researchers know any of these theories, it's exactly these two because these papers are referenced in the AGI Safety Fundamentals Alignment curriculum. I think it would be more useful for the community for more people to get acquainted with more different theories of ML or DL so that the community as a whole has a more diversified understanding and perspective. Of course, it would be ideal if some people learned all these theories and were able to synthesise them, but in practice, we can hardly expect that such super-scholars will appear because everyone has so little time and attention.

The list above is copied from the post "A multi-disciplinary view on AI safety research". See also the section "Weaving together theories of cognition and cognitive development, ML, deep learning, and interpretability through the abstraction-grounding stack" in this post, which is relevant to this question.

Balestriero’s spline theory of deep learning (2018) and the geometry of deep networks (2019)
Olah et al.’s theory of circuits (2020)
Roberts, Yaida, and Hanin’s deep learning theory (2021)
Vanchurin’s theory of machine learning (2021)
Anthropic’s mathematical framework for transformers (2021)
Boyd, Crutchfield, and Gu’s theory of thermodynamic machine learning (2022)
Marciano’s theory of DNNs as a semi-classical limit of topological quantum NNs (2022)
Bahri et al.’s review of statistical mechanics of deep learning (2022)
Alfarra et al.’s tropical geometry perspective on decision boundaries of NNs (2022)

LESSWRONG
LW