[Intro to brain-like-AGI safety] 1. What's the problem & Why work on it now?
(Last revised: January 2026. See changelog at the bottom.) 1.1 Post summary / Table of contents This is the first of a series of blog posts on the technical safety problem for hypothetical future brain-like Artificial General Intelligence (AGI) systems. That previous sentence might raise a few questions, such as: What is “AGI”? What is “brain-like AGI”? What is “the technical safety problem for brain-like AGI”? If these are “hypothetical future systems”, then why on Earth am I wasting my time reading about them right now? …So my immediate goal in this post is to answer all those questions! After we have that big-picture motivation under our belt, the other 14 posts of this 15-post series will dive into neuroscience and AGI safety in glorious technical detail. See the series cover page for the overall roadmap. Summary of this first post: * In §1.2, I define the “AGI technical safety problem”, put it in the context of other types of safety research (e.g. inventing passively-safe nuclear power plant designs), and relate it to the bigger picture of what it will take for AGI to realize its potential benefits to humanity. * In §1.3, I define “brain-like AGI” as algorithms with big-picture similarity to key ingredients of human intelligence. Future researchers might make such algorithms by reverse-engineering aspects of the brain, or by independently reinventing the same tricks. Doesn’t matter. I argue that “brain-like AGI” is a yet-to-be-invented AI paradigm, quite different from large language models (LLMs). I will also bring up the counterintuitive idea that “brain-like AGI” can (and probably will) have radically nonhuman motivations. I won’t explain that here, but I’ll finish that story by the end of Post #3. * In §1.4, I define the term “AGI”, as I’m using it in this series. * In §1.5, I discuss whether it’s likely that people will eventually make brain-like AGIs, as opposed to some other kind of AGI (or just not invent AGI at all). The section includes seven
If you compare a human in 30000 BC to a human today, our brains are full of new information that wasn’t in the training data of 30000 BC. I want to talk about: what would it look to be in a world where you can put millions of LLMs in a sealed box containing a VR environment, for (the equivalent of) thousands of years, and then we open up the box and find that the LLMs have made an analogous kind of scientific and technological progress? (See §1 of “Sharp Left Turn” discourse: An opinionated review.)
Spoiler: I think this is fundamentally impossible with LLMs as we know them today. Anyway, let’s explore... (read 388 more words →)