[Intro to brain-like-AGI safety] 1. What's the problem & Why work on it now?
(Last revised: January 2026. See changelog at the bottom.) 1.1 Post summary / Table of contents This is the first of a series of blog posts on the technical safety problem for hypothetical future brain-like Artificial General Intelligence (AGI) systems. That previous sentence might raise a few questions, such as: What is “AGI”? What is “brain-like AGI”? What is “the technical safety problem for brain-like AGI”? If these are “hypothetical future systems”, then why on Earth am I wasting my time reading about them right now? …So my immediate goal in this post is to answer all those questions! After we have that big-picture motivation under our belt, the other 14 posts of this 15-post series will dive into neuroscience and AGI safety in glorious technical detail. See the series cover page for the overall roadmap. Summary of this first post: * In §1.2, I define the “AGI technical safety problem”, put it in the context of other types of safety research (e.g. inventing passively-safe nuclear power plant designs), and relate it to the bigger picture of what it will take for AGI to realize its potential benefits to humanity. * In §1.3, I define “brain-like AGI” as algorithms with big-picture similarity to key ingredients of human intelligence. Future researchers might make such algorithms by reverse-engineering aspects of the brain, or by independently reinventing the same tricks. Doesn’t matter. I argue that “brain-like AGI” is a yet-to-be-invented AI paradigm, quite different from large language models (LLMs). I will also bring up the counterintuitive idea that “brain-like AGI” can (and probably will) have radically nonhuman motivations. I won’t explain that here, but I’ll finish that story by the end of Post #3. * In §1.4, I define the term “AGI”, as I’m using it in this series. * In §1.5, I discuss whether it’s likely that people will eventually make brain-like AGIs, as opposed to some other kind of AGI (or just not invent AGI at all). The section includes seven
This is an interesting topic, but no, my central expectation (and what I’m arguing for here) is that 100% of the ASIs will be ruthless consequentialists.
Couple little points on that side-track though: (1) Ruthless consequentialist AIs can still copy themselves, and cooperate with their copies, if their goals are non-indexical (which they might or might not be, no opinion off the top of my head), (2) Your comment seems to assume that AIs can read each other’s minds? If they can’t, a smart ruthless consequentialist AI would act in a cooperative and prosocial way in an environment where doing so was to its advantage. I agree that mind-reading is an important dynamic that might change the equilibrium in a multipolar AI world.