In June, I received a grant from the LTFF for a 6-months period of self-study aimed at mastering the necessary background for AI alignment research. The following is advice I would give to people who are attempting something similar. I have tried to keep it short.
Basic advice
You’ll naturally want to maximize “insights per minute” when choosing what to read. But, don’t expect it to be obvious what the most impactful reading material is! It often takes actual focused thought to figure this out.
One shortcut is to just ask yourself what you are really curious about; based on the idea that your curiosity should track “value of information” to some degree, so it can’t be that wrong to follow your curiosity, but also, working through a textbook takes quite a bit of mental energy, so having natural curiosity to power through your study is very helpful.
If you don’t already have something you’re curious about, you can try the following technique to try to figure out what to read:
First, list all the things you could potentially read.
This step includes looking at recommendation lists from other people. (See below for two possible lists.)
For each thing on the list, write down how you feel about maybe reading that.
Be honest with yourself.
Try to think of concrete reasons that are shaping your judgment.
Then, look back over the list.
Hopefully, it should be easier to decide now what to read.
This was helpful to me, which doesn’t necessarily mean it’s helpful for you, but it’s maybe something to try.
Advice specifically for AI alignment
The above should hold for any topic; the following advice is for AI alignment research study specifically.
I think you basically can’t go wrong with reading all of (or maybe, the to-you-most-interesting 80% of) the AI alignment articles on Arbital. I found this to be the most effective way to rapidly acquire a basic understanding of the difficulty.
It’s probably also a good idea to read some concrete results from alignment research, if only to inform you about what kind of math is required. I think Risks from Learned Optimization in Advanced Machine Learning Systems is one good option. I don’t know of a good list of other results.
Concrete reading recommendations
The following are recommendations for concrete books/topics that I haven’t seen mentioned anywhere else, but that I liked. I won’t repeat anything that’s already in the MIRI research guide or John Wentworth’s study guide.
The MIRI research guide recommends “Lambda-Calculus and Combinators” for type theory, but that book is mostly focused on lambda calculus (and is a bit difficult to read, in my opinion).
In order to learn about pure lambda calculus, though (like Church numerals and Y combinators), HoTT is not the right book. I don’t really know of a good book for that.
Learn category theory and topology at the same time! I had trouble motivating myself for learning topology, so combining it with category theory seemed promising to me.
I would recommend reading the AI alignment articles on Arbital first. Or maybe read the two in parallel: any time you want to know more about something that came up in one of the conversations, look it up on Arbital.
See the link for an explanation for why this is useful.
Kolmogorov axioms
It is said that “Bayesians prefer Cox’s theorem for the formalization of probability”, but I think knowing Kolmogorov’s classical probability axioms is also important.
This in no way replaces reading Probability Theory by E.T. Jeynes.
Finally, I’ll remind you of the planning fallacy, but also of the fact that it’s good to make plans even if they won’t survive contact with reality, because plans help you keep track of the big picture.
In June, I received a grant from the LTFF for a 6-months period of self-study aimed at mastering the necessary background for AI alignment research. The following is advice I would give to people who are attempting something similar. I have tried to keep it short.
Basic advice
You’ll naturally want to maximize “insights per minute” when choosing what to read. But, don’t expect it to be obvious what the most impactful reading material is! It often takes actual focused thought to figure this out.
One shortcut is to just ask yourself what you are really curious about; based on the idea that your curiosity should track “value of information” to some degree, so it can’t be that wrong to follow your curiosity, but also, working through a textbook takes quite a bit of mental energy, so having natural curiosity to power through your study is very helpful.
If you don’t already have something you’re curious about, you can try the following technique to try to figure out what to read:
This was helpful to me, which doesn’t necessarily mean it’s helpful for you, but it’s maybe something to try.
Advice specifically for AI alignment
The above should hold for any topic; the following advice is for AI alignment research study specifically.
Concrete reading recommendations
The following are recommendations for concrete books/topics that I haven’t seen mentioned anywhere else, but that I liked. I won’t repeat anything that’s already in the MIRI research guide or John Wentworth’s study guide.
Finally, I’ll remind you of the planning fallacy, but also of the fact that it’s good to make plans even if they won’t survive contact with reality, because plans help you keep track of the big picture.
Good luck!
Obviously, you can’t read them all within 6 months. I had read a lot of them already before I started my LTFF grant.