Totally reasonable, the big man would agree:
https://twitter.com/ESYudkowsky/status/1773064617239150796
Humans augmented far enough to stop being idiots, smart enough that they never put faith in a step that will not hold, as do not exist in this world today, could handle it. But that's the army you need; and if instead you go to war with the army you have, you just get slaughtered by a higher grade of problem than you can handle.
Sounds like intelligence amplification. 100 brains in a vat is a good form of augmentation since we know there are fundamental limits to augmenting one human brain (heat, calories and oxygen supply, fixed space for more grey matter, immune system rejection, insanity, damage from installing deep brain electrode grids, need for more advanced technology to install on the order of billions of electrodes..) so why not 100.
Just a 'few minor problems' with needing to master all of human biology to keep 100 brains in a tank alive long enough to get results. Biology, it's just so damn complicated, and 1.5 million papers are published every year. Plus to keep brains alive you need to be an expert on the immune system, and circulatory, and liver and digestion, and all their weird organs that nobody knows what they do. It's just so hard to find anyone qualified to make decisions multiple times a second to keep the brains alive.
If only there was a solution, some way to automate this process. You'd need greater than human intelligence though...
Fun idea, but idk how this helps as a serious solution to the alignment problem.
suggestion: can you be specific about exactly what “work” the brain-like initialisation is doing in the story?
thoughts:
That said, I think something in the neighbourhood of this idea could be helpful.
I think Neurallink already did this actually, a bit late to the point but a good try anyway. Also, have you considered having Michael Bay direct the research effort? I think he did a pretty good job with the first Transformers.
Train them to predict each other. Human brains being the most general-purpose objects in existence, this should be a very richly general training channel, and incentivizes brain-to-brain (B2B) interaction.
You may wish to consider a review of the political science machine learning literature here; prior work in that area demonstrates that only a GAN approach allows brains to predict each other effectively (although I believe that there's some disagreement from Limerence Studies scholars).
TL;DR: Many alignment research proposals often share a common motif: figure out how to enter a basin of alignment / corrigibility for human-level models, and then amplify to more powerful regimes while generalizing gracefully. In this post we lay out a research agenda that comes at this problem from a different direction: if we already have ~human-level systems with extremely robust generalization properties, we should just amplify those directly. We’ll call this strategy “Gradient Descent on the Human Brain”.
Introduction
Put one way, the hard part of the alignment problem is figuring out how to solve ontology identification: mapping between an AI’s model of the world and a human’s model, in order to translate and specify human goals in an alien ontology.
In generality, in the worst case, this is a pretty difficult problem. But is solving this problem necessary to create safe superintelligences? The assumption that you need to solve for arbitrary ontologies is true if you assume that the way to get to superintelligence necessarily routes through systems with different ontologies. We don’t need to solve ontology translation for high-bandwidth communication with other humans[1].
Thus far, we haven’t said anything really novel. The central problem to this approach, as any alignment researcher would know, is that we don’t really have a good way to bootstrap the human brain to superintelligent levels. There have been a few attempts to approach this recently, though focusing on very prosaic methods that, at best, buy points on the margin. Scaling to superintelligence requires much stronger and robust methods of optimization.
The Setup
The basic setup is pretty simple, though there are a few nuances and extensions that are hopefully self-explanatory.
The simple version: Take a hundred human brains, put them in a large vat, and run gradient descent on the entire thing.
The human brain is a remarkably powerful artifact for its size, so finding a way to combine the capabilities of a hundred human brains with gradient descent should result in something significantly more powerful. As an intuition pump, think of how powerful human organizations are with significantly shallower communication bandwidth. At the very lowest bound we can surpass this, more impressive versions of this could look like an integrated single mind that combines the capabilities of all hundred brains.
The specifics of what the training signal should be are, I think, a rather straightforward engineering problem. Some pretty off-the-cuff ideas, in increasing order of endorsement:
A slightly more sophisticated setup:
Aside: Whose brains should we use for this?
The comparative advantage of this agenda is the strong generalization properties inherent to the human brain[2]. However, to further push the frontier of safety and allow for a broad basin of graceful failure, we think that the brains used should have a strong understanding of alignment literature. We’re planning on running a prototype with a few volunteer researchers - if you want to help, please reach out!
Potential Directions
More sophisticated methods
In light of recent excitement over sparse auto-encoders (SAEs), one direction we think would be interesting would be training SAEs on the human brain and seeing whether we can get more targeted amplification.
Outreach
We believe that this agenda also aids in community outreach. For instance, e/accs seem unlikely to gain any real political clout due to lacking the mandate of heaven, but we can certainly get them on board with this idea as accelerating the literal next stage of human evolution.
Alternate optimization methods
For reasons beyond the scope of this post, we’re also excited about the potential of training human brains using the forward-forward algorithm instead of standard backpropagation.
Appendix
This contains some rough notes on more detailed sketches. Some of them are pretty hand-wavey, but it seems better to put them up than not for legibility.
Toy examples
Initial Sketch (Simple Case, Maximum Technology Assumed)
Initial Sketch (Harder, Advanced Technology Assumed)
How to run gradient descent on the human brain (longer version)
Neural gradient descent: Organoid edition
A more advanced sketch
This will be a bit more advanced than the toy experiment on our organoid, as it’s intended to be a prototype for running on a human brain.
Though I’m not claiming it’s a trivial problem even for humans, there’s certainly some variance in ontology - the central point here is that it’s much more manageable and easier.
To clarify: these generalization properties are literally as good as they can get, because this tautologically determines what we would want things to generalize as.
Eye tracking tech may also help here.