There is also another linkpost for the same blogpost: https://www.lesswrong.com/posts/EP92JhDm8kqtfATk8/yoshua-bengio-argues-for-tool-ai-and-to-ban-executive-ai
I would make clear distinction between risk of AGI going rogue and AGI being used by people with poor ethics. In general, the problem of preventing a machine of accidentally doing due to malfunction is very different from preventing people from malicious use. If the idea of AI scientist can solve the first problem then it is worth promotion.
Preventing bad actors from using AI is difficult in general because they could use open source version or develop one on their own - especially the state actors could do that. Thus, IMHO the best way to prevent, for example North Korea decides to use AI against US is for the US to have superior AI on its own.
The AI safety proposal was published by Yoshua Bengio on May 7, 2023.
Below I quote the salient sections of the proposal and comment on them in turn.
Main thesis: safe AI scientists
Bengio’s “AI scientists” are similar in spirit to tool AI, Oracle AI, and the “simulators” agenda. All these ideas share a weakness: tool AI may still cause a lot of harm in the hands of “misaligned” people, or people with poor ethics. I’m not sure which AI design would be on balance better[1]: tool AI or agent AI, but my intuition, stemming perhaps from the “skin in the game” principle, tells me that agent AI would be on balance better (if some other aspects of the overall system design are also done right, which I touch upon below, in the last section).
Note that in the Gaia network architecture (Kaufmann & the Digital Gaia Team, 2023), the intelligence nodes (called “Ents”) are agents as well as Bayesian learners (which is a synonym for “scientists”), as in Bengio’s proposal. But Ents are supposed to be situated within the system of incentives and action affordances that prevent these agents (if they are misaligned) from amassing a lot of power, unless they are able to create a completely isolated supply chain, like in Yudkowsky’s self-replicating nanobots argument (which many people don’t think is realistic).
Training an AI Scientist with Large Neural Nets for Bayesian Inference
I quote this section in full as I think it would be interesting for readers. I don’t have any comments or disagreements to add to it.
AI scientists and humans working together
I think this vision of human-AI collaboration, which I can summarise as “AI does science, humans do ethical deliberation”, has a big practical problem which I already touched upon above: (some) humans, including scientists, have very bad ethics. Both “The Wisdom Gap” and the famous Edward Wilson’s quote “The real problem of humanity is the following: we have Paleolithic emotions, medieval institutions and godlike technology.” point to the fact that both humans’ individual and collective ethics (as well as our innate biological capacities to execute upon these ethics, cf. Bertrand Russell’s “two kinds of morality”) become increasingly inadequate to the power of the technology and the civilisational complexity. However, Bengio’s proposal seemingly helps to advance technology much further but doesn’t radically address the ethics question (apart from “helping to improve the education”, but this is a very slow route).
I think that all paths to safe superhuman AI (whether a “scientist” or an agent) must include turning ethics into science and then delegating it to AI as well. The only remaining role of humans would be to provide “phenomenological” evidence (i.e., telling if they perceive something as “good” or “bad”, or feel that way on the neurological level, which AI may figure out bypassing unreliable verbal report) from which AIs will infer concrete ethical models[6].
But then, if AIs are simultaneously superhuman at science and have a superhuman conscience (ethics), it feels that they could basically be made agents, too, and the results would probably be better than if agency is still in the hands of people even though both epistemology and moral reasoning are delegated to AI.
In short: I think that Bengio’s proposal is insufficient for ensuring with reasonable confidence that the AI transition of the civilisation goes well, even though the “human—AI alignment problem” is defined out of existence because the human-to-human alignment (ethics, and wisdom) problems remain unsolved. On the other hand, if we do solve all these problems (which is hard), then formally taking agency away from AI seems unwarranted.
Bengio’s “AI scientists” are also similar to OpenAI’s “alignment MVP” and Conjecture’s CoEms, with the difference that neither the “alignment MVP” nor CoEms are supposed to be “relatively permanent” solution”, but rather mostly (or exclusively) used to develop cognitive science, epistemology, ethics, and alignment science, and then scrap the “AI scientist” and build something that it itself helped to design instead.
The (not only) political challenge
I would add to this that apart from legal and political, there are also economic[7] and infrastructural aspects of building up the civilisational immune system against misaligned AI developments and rogue actors.
Some economic and infrastructural restructuring ideas are presented in the Gaia network architecture paper which I already referred to earlier. I pointed out on me more infrastructural and economic inadequacies here:
See also “
Information security considerations for AI and the long term future
” (Ladish & Heim, 2022).In technical terms, “better” here could be something like occupying a better Pareto front the usefulness/efficiency vs. safety/robustness tradeoff chart, however, using the proper logic of risk taking may lead to a different formulation.
Tristan Deleu, António Góis, Chris Emezue, Mansi Rankawat, Simon Lacoste-Julien, Stefan Bauer, Yoshua Bengio, “
Bayesian Structure Learning with Generative Flow Networks
“, UAI’2022, arXiv:2202.13903, February 2022.Nan Rosemary Ke, Silvia Chiappa, Jane Wang, Anirudh Goyal, Jorg Bornschein, Melanie Rey, Theophane Weber, Matthew Botvinic, Michael Mozer, Danilo Jimenez Rezende, “
Learning to Induce Causal Structure
“,ICLR 2023, arXiv:2204.04875, April 2022.Noah Hollmann, Samuel Müller, Katharina Eggensperger, Frank Hutter, “TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second”, ICLR 2023, arXiv:2207.01848, July 2022.
Edward Hu, Nikolay Malkin, Moksh Jain, Katie Everett, Alexandros Graikos, Yoshua Bengio, “GFlowNet-EM for learning compositional latent variable models”, arXiv:2302.06576.
Later, AI might acquire such a powerful morphological intelligence that it could be able to model entire human brains itself and thus could in principle take away this “phenomenological” role from humans, too. See “Morphological intelligence, superhuman empathy, and ethical arbitration” on this topic.
The economy may coalesce with politics, since political economy is often viewed as an indivisible system and the area of study.