Introduction
I'm currently flirting with the idea of trying for a math PhD 2 - 3 years down the line.
I'm currently on a Theoretical Computer Science Masters program at the University of [Redacted] in the United Kingdom.
(My program is 2 semesters of teaching (7 - 8 months) followed by a 9 - 12 month industrial placement starting in June 2023. [I might forego the 1 - year industrial placement if I can't get a suitable placement [I.E. theoretical research that feels like it would be valuable experience for the kind of career I want to pursue] and graduate after one year by completing a masters project over the summer instead.)
After graduation I'm considering taking a gap year to fill in the gaps in my maths knowledge/prepare for the PhD, maybe see if I can contribute to the research agendas I think are interesting/promising).
I might also pursue a PhD in Theoretical Computer Science instead of mathematics (maybe applications for a TCS PhD would be looked on more favourably with a TCS Masters/recommendations from my current lecturers).
Why A PhD?
I currently plan to learn a lot of (especially abstract) maths to (upper) graduate level for alignment theory (I want to do theoretical alignment work that is basically just applied maths), and I think I would benefit from the opportunity to study the relevant mathematics under a "guru"/the dedicated mentorship a PhD provides. I expect the first few years of my career in alignment research would be mostly spent on deconfusion/distillation, and I expect high levels of mathematical sophistication to be very valuable for that.
I find abstract maths and mathematical modelling "fun", and really enjoy being a student.
My Alignment Theory of Change
I am operating under/optimising for long timelines (transformative AI is decades away [20+ years]), and this influences what kind of research I believe to be most promising.
I expect theoretical (especially foundational [especially in our current pre paradigmatic stage to be the most promising]) and am persuaded by agent foundations agendas. The extant research agendas I'm most excited for and could see myself working on someday:
- John Wentworth's Natural Abstractions Hypothesis and Selection Theorems
- Vanessa Kosoy's Learning Theoretic Alignment Agenda
- Other agendas that take a desiderata first approach to alignment
- Garrabant and Demski's Embedded Agency
My basic plan for alignment is something like:
- Study subjects that seem relevant to alignment theory
- Mathematics: a fuckton
- Theoretical Computer Science: likewise, a fuckton
- Statistics (and its theory)
- Learning Theory (Algorithmic and Statistical)
- Information Theory (Algorithmic and Statistical)
- Physics: Thermodynamics
- Optimisation
- Evolutionary Theory
- Analytic Philosophy: ontology, epistemics, ethics, etc.
- Develop executable/computable philosophy for the above
- Decision Theory
- Game Theory
- Grapple with concepts that bear on agent foundations until I understand them better ("Deconfusion")
- Information and entropy
- Computation (especially as an information dynamics phenomenon)
- Abstractions, ontology, modelling/map making
- Optimisation
- Causality/dependencies and counterfactuals (including logical)
- Epistemics (including for ideal agents)
- Decision Making (including for ideal agents; especially in multi-agent environments)
- Emergent behaviour in multi-agent environments (e.g. competition, coordination vs conflict, evolution)
- Systems (especially complex) and their emergent behaviour
- Embedded Agency more generally
- Anthropics?
- Distill my learnings and understandings to make them more widely accessible ("Distillation")
- Iterate #1 - #3
- ...
- Formulate an adequate theory of robust agency
- ...
- Solve alignment
Of course, I don't expect to make it all the way to step 8. Mostly, I expect that deconfusion and distillation would be where most of the value from my "career" will come from.
Don't overthink this! Go with the feelz!