Interesting argument though I don't quite agree with the conclusion to stay away from brain-like AGI safety.
I think you could argue that if the assumption holds that AGI will likely be brain-like, it would be very important that safety researchers look at the perspective before mainstream AI research realizes this.
I think there also is a point to be said that you could probably tell the safety community about your discovery without speeding up mainstream AI research, but this depends on what exactly your discovery is (i.e. might work for theoretical work, less for practical work)
Even if you were very convinced that brain-like AGI is the only way we can get there, it should still be possible to do research that is speeding up safety differentially. I.e. If you discovered some kind of architecture that would be very useful for capabilities, you could just stop laying out how it would be useful and instead do work on the assumption that future AI will look that way and base your safety work on that.
Recently, I wrote an article together with Jan Kirchner on "brain enthusiasts" in AI Safety (if you find work on neuroscience/cognitive science x AI (Safety) interesting, let me know 🙂). When crafting and researching our arguments, we often came across one argument, that made us confused and uncertain about this whole topic.
Epistemic status: not rigorously researched at all, just trying to get other people's opinions on this. I am deeply unsure about this whole argument, so there might be some unclarities in my writing because of this.
For practical reasons, I will continue to use the term "brain enthusiasts" as a blanket term for "People with a neuroscience/cognitive science background".
The argument roughly goes like this:
Summarizing the mechanism of the argument: let’s make provisions for the case that AGI will be brain-inspired. In that case, we should study neural sciences and cognitive science to get a better idea of what such a future AGI might look like.
We struggled with this argument since it is based on many speculative assumptions (this is also the reason why we didn’t include it in the list of research topics in the original article) and seems weak overall.
Still, I am curious and motivated by Evie Cottrell’s amazing blog post, demanding to be more open about confusion and uncertainties. Thus, I want to lay this argument out here and talk a bit about it. If you have relevant intuitions, please let me know!
First, I want to lay out my uncertainties.
Differential intellectual progress
DIP describes “prioritizing risk-reducing intellectual progress over risk-increasing intellectual progress”. I am particularly wary of approaches to AI Alignment that also serve a purpose in AI capability research. Insights gained from studying the brain and understanding intelligence through neuroscience/cognitive science are potentially really useful (risk-reducing) but also potentially really harmful (risk-increasing) to AI Safety.
It is imaginable that we get to AGI through a hybrid or 100% brain enthusiastic approach. Understanding mechanisms of the brain like social instincts, motivation, or values might be especially interesting then. But these insights walk the fine line between AI capabilities research and AI Safety. I am worried that gaining knowledge about the brain that is specially tailored to the field of AI is benefiting AI capabilities research more than AI safety research, especially since more people might be able to put these insights to use in AI capabilities, compared to AI safety. So, according to DIP, we should rather focus on things that are useful for reducing the risk of AI.
Convergent evolution
Another question is: are properties of the human brain desired (we want to build them in) or convergent (they are necessary components of intelligent agents) for AGI?
If they are convergent, they will likely be present in AGI. If we implement them in AGI in a fashion that is like how they are implemented in the brain, then we could gain valuable insights by studying them in the brain and inferring what they might look like in AGI and how we could align them.
One example might be to study social instincts (inspired by Steven Byrnes's brain-like AGI Safety series): if we can reverse-engineer social instincts and moral intuitions in a substrate-independent way, adjust them and implement them in AGI, we end up with aligned AGI.
Similarly, if we say that social instincts are convergent properties of aligned AGI, I think that we should study them as well, and brain enthusiasts might be especially useful here.
Open questions:
Does this imply that we shouldn’t spend time on epistemic translation from insights in neuroscience/cognitive Science and adjacent fields to AI?
What are projects in neuroscience/cognitive Science that solely benefit risk-reducing intellectual progress on AI?
How useful is building better developmental models of how AGI could look like, if at risk of increasing AGI progress.
The brain and behavior. Kandel E.R., & Koester J.D., & Mack S.H., & Siegelbaum S.A.(Eds.), (2021). Principles of Neural Science, 6e. McGraw Hill.
Thagard, Paul, "Cognitive Science", The Stanford Encyclopedia of Philosophy (Winter 2020 Edition), Edward N. Zalta (ed.)
Deep Learning: A Critical Appraisal by Gary Marcus
The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence by Gary Marcus
Neuroscience-Inspired Artificial Intelligence by DeepMind