For what it’s worth, I am not doing (and have never done) any research remotely similar to your text “maybe we can get really high-quality alignment labels from brain data, maybe we can steer models by training humans to do activation engineering fast and intuitively”.
I have a concise and self-contained summary of my main research project here (Section 2).
I care a lot! Will probably make a section for this in the main post under "Getting the model to learn what we want", thanks for the correction.
Lists cut from our main post, in a token gesture toward readability.
We list past reviews of alignment work, ideas which seem to be dead, the cool but neglected neuroscience / biology approach, various orgs which don't seem to have any agenda, and a bunch of things which don't fit elsewhere.
Appendix: Prior enumerations
Appendix: Graveyard
Appendix: Biology for AI alignment
Lots of agendas but not clear if anyone besides Byrnes and Thiergart are actively turning the crank. Seems like it would need a billion dollars.
Human enhancement
Merging
As alignment aid
Appendix: Research support orgs
One slightly confusing class of org is described by the sample {CAIF, FLI}. Often run by active researchers with serious alignment experience, but usually not following an obvious agenda, delegating a basket of strategies to grantees, doing field-building stuff like NeurIPS workshops and summer schools.
CAIF
AISC
See also:
Appendix: Meta, mysteries, more