Neat tool, it looks like you put a lot of work into this. I think the harder part for me is determining the parameters that need to be input into your model (e.g. % likelihood that AGI will go wrong). But this could be an interesting way to explore the ramifications of different views on AGI and safety research.
This is an attempt at building a rough guess of the impact you might expect having with your career in AI safety.
Your impact will depend on the choices you make, and the estimation of their consequences also depends on your beliefs about AI safety, so I built an online tool so that you can input your own parameters into the model.
The main steps of the estimation
Main simplifying assumptions
These assumptions are very strong, and wrong. I’m unsure in which direction the results would move if the assumptions were more realistic. I am open to suggestions to replace parts of the model in order to replace these assumptions with assumptions that are less wrong. Please tell me if I introduced a strong assumption not mentioned above.
Mathematical model of the problem
Model of when a world-ending AGI might happen, if there has been no AI safety progress
Without your intervention, and if there is no AI safety progress, AGI will happen at time X, where X follows a distribution of density ^q over [t0,+∞[, where ∫+∞t0^q=P(AGI happens)[1]. AGI kills humanity with probability pkill. This is assumed to be independent of when AGI happens. Therefore, without you, AGI will kill humanity at time Y, where Y follows a distribution of density q(t)=pkill^q(t) (for all t∈[t0,+∞[).
Model of when the AI alignment problem might be solved
Without you, AI Alignment will be solved at time Z, where Z follows a distribution of density p, where ∫+∞t0p=P(AI alignment can be solved). Its cumulative distribution function is F.
Modeling your impacts on timelines
With your intervention, AI alignment will be solved at time ¯¯¯¯Z, where ¯¯¯¯Z follows a distribution of density ¯¯¯p. Its cumulative distribution function is ¯¯¯¯F.
Between time t1 and t2, you increase the speed at which AI Alignment research is done by a factor of s. That is modeled by saying that ¯¯¯¯F(t)=F(u(t)) where u(t) is a continuous piecewise linear function with slope 1 in [t0,t1], slope 1+s in [t1,t2], and slope 1 in [t2,+∞[: between t1 and t2, you make “time pass at the rate of 1+s".
s can be broken down in several ways, one of which is s=f^s, where f is the fraction of the progress in AI Alignment that the organization at which you work is responsible for, and ^s is how much you speed up the speed at which your organization is making progress[2]. This is only approximately true, and only relevant if you don’t speed your organization’s progress too much. Otherwise, effects like your organization depends on the work of others come into play and f becomes a complicated function of ^s.
A similar work can be done to compute ¯¯¯q: with you, AGI will happen at time ¯¯¯¯Y, where ¯¯¯¯Y follows a distribution of density ¯¯¯q obtained from q in the same way as we obtained ¯¯¯p from p.
Computing how your intervention affects the odds that humanity survives
The probability of doom without your intervention is d=∫+∞t0∫+∞t01t>uq(t)p(u)dtdu+psad, where psad=P(AGI that kills happens & AI Alignment can not be solved)=(∫+∞t0q(t)dt)(1−∫+∞t0p(t)dt) (which does not depend on your intervention).
The probability of doom with your intervention is ¯¯¯d=∫+∞t0∫+∞t01t>u¯¯¯q(t)¯¯¯p(u)dtdu+psad.
Hence, you save the world with probability Δ=d−¯¯¯d. From there, you can also compute the expected number of lives saved.
Results of this model
This Fermi estimation takes as input ^q (your belief about AGI timelines), pkill (your belief about how AGI would go by default), p (your belief about AI alignment timelines), f (your belief about your organization's role in AI Alignment) and ^s (your belief about how much you help your organization). The results are a quite concrete measure of impact.
You can see a crude guess of what the results might look like if you work as a researcher in a medium-sized AI safety organization for your entire life here. With my current beliefs about AGI and AI alignment, humanity is doomed with probability 0.13, and you save the world[3] with probability 3×10−5.
I don’t have a very informed opinion about the inputs I put inside the estimation. I would be curious to know what result you would get with better informed estimations of the inputs. The website also contains other ways of computing the speedup s, and it is easy for me to add more, so feel free to ask for modifications!
Note: the website is still a work in progress, and I’m not sure that what I implemented is a correct way of discretizing the model above. The code is available on GitHub (link on the website), and I would appreciate it if you double-checked what I did and added more tests. If you want to use this tool to make important decisions, please contact me so that I increase its reliability.
P(X=+∞)=1−∫+∞t0^q=P(AGI is never achieved)
^s should take into account that your absence would probably not remove your job from your organization. In most cases, it would result in someone else doing your job, but slightly worse.
You “save humanity” in the same sense as you make your favorite candidate win if the election was perfectly balanced: the reasoning I used is in this 80000 hours article. In particular, the reasoning is done with causal decision theory and does not take into account the implications of your actions on your beliefs about other actors' actions.