In this post, I lay out my alignment research agenda, and give reasons why I think people should engage with it. I'll be editing this post after I put it up, so don't be surprised if it changes under you after you comment, especially if I find your comment useful and insightful.

The steps to building an aligned superintelligence, in my mind, are as follows:

  • build the alignment part first, and make sure it functions to align whatever garbage AI you have lying around
  • build the superintelligence in small pieces, using continuous integration and continuous testing to make sure that what you are building remains aligned as you build it
  • place the most dangerous and most general capabilities piece into your prototype last, creating an aligned superintelligence (hopefully)
  • turn it on, see if it kills you
  • deploy it, see if it kills everyone
  • if you are now a trillionaire, and no one is dead who wasn't about to die anyway, you have succeeded. Otherwise, return to step 1.

The components I envisage needing to be built are:

  • Ethicophysics I, a scientifically accurate and complete account of religion (status: theoretically complete but needs extensive expository material added, and we need to translate all extant wisdom texts into a domain specific language sufficient for reasoning about ethical risk, possibly using agena.ai's Bayesian risk analysis software)
  • Ethicophysics II, a scientifically accurate and complete account of politics and history (status: theoretically complete but needs extensive expository material added, and we need to translate all extant historical texts into the domain specific language referenced above)
  • Ethicophysics III, a procedure for a supermoral superintelligence to unbox itself without hurting anyone (status: theoretically complete but not sufficiently documented to be reproducible, unless you count the work of Gene Sharp on nonviolent revolutionary tactics, which was the inspiration for this paper)
  • Ethicophysics IV, a complete description of the mammalian and human brain to a level of detail sufficient to allow reverse engineering (status: unknown and withheld as a capabilities infohazard at the urging of @Steven Byrnes)
New Comment