Thanks for the contribution. I think it's really important to keep brainstorming new alignment/safety approaches. We haven't had enough different people working on alignment long enough to know that we've thought of every idea. This one, for instance, is completely new to me.
I put this in the class of ideas "start over on AGI and take a safer approach". I tend to think there's little chance that these ideas will be useful in our world, where progress toward AGI is going fast enough that it would be quite difficult to stop. But who knows.
It strikes me that it would be easier to slow down running AI on standard computers. I guess part of the idea is that you'd make it impossible for the AI to proliferate to other hardware, or bump up its run-rate. It could ultimately write or create a version of itself that could run on tradional hardware, but that would be a large roadblock to takeover.
I do concede that in theory it should be possible to do this on standard computers, and that it might actually be a good way to test this hypothesis out and gather empirical data today.
Where I'm not so sure is whether even "slow" standard computers think slower than humans do. In other words, imagine some future AI architecture that is OOM more power and compute efficient. It may not be entirely unreasonable to suppose that we could get an algorithmic/architectural innovation that would enable GPT-4 level performance on old legacy hardware (eg, from the 80s). Indeed, at the unit level, we have calculators from the 60s that can out-multiply the fastest mental mathematicians today.
I thought of using virtualization/simulation to emulate a slow sandbox environment for the AI to run in, but as you mention, it does lower the barrier for an "escape", creating risk that might be unacceptable when dealing with ASI.
With that said, being able to simulate this on current hardware does suggest a possible way to test this whole hypothesis out today. For example, we could have two different models: Model A, with higher parameter count and ELO score running on a slower environment, and Model B, with less parameters/lower ELO score running on a faster environment, and see if Model B is able to sufficiently supervise Model A on some task that would be difficult (or impossible) if both were running at the same speed.
Lately, much has been discussed about PauseAI, or even stopping research completely, until further progress has been made in theory or technical approaches to alignment. After thinking about this for some time, I wondered if there was a way to formalize this reasoning in mathematical terms when I stumbled upon what might be an interesting, possibly novel approach to alignment:
What if we leveraged the nature of slower computing substrates to run AI at a slower pace than current digital computers?
By "slower substrate", I don't mean just diminishing CPU/GPU clock speeds, number of cores, or RAM/VRAM. I mean choosing fundamentally slower forms of computing that would be impossible to speed up past some performance ceiling. In this way, we might be able to run and validate stronger-than-human AI in human time. Here's an intuition pump for why it could work.
Total Intelligence and the Three Forms Of Superintelligence
In his book Superintelligence, Nick Bostrom categorizes superintelligence into three forms:
Note that a given AI might attain superhuman performance through one, or a combination of, any of these three forms. This suggests a factored cognition in the form of a kind of Total Intelligence Equation, where Quality, Speed, and Number Of Agents in coordination are all directly proportional to Total Intelligence:
TotalIntelligence=Quality∗Speed∗NumberOfAgents
The Slower Superintelligence Mental Model
Given this relationship, it stands to reason that we might evaluate a high-quality superintelligence safely by diminishing its speed or collective capacity. In other words, if we consider Total Intelligence as a product of Quality, Speed, and Number of Agents, then a feasible approach might involve increasing the quality factor while decreasing speed and the number of agents proportionally or more. This results in a high-quality but controllable form of superintelligence.
Varying Total Intelligence by varying Number Of Agents is not a novel idea. This was the the core concept behind Paul Christiano's work on iterated amplification (and distillation), which explored using a number of aligned agents to align a single smarter agent. However, whether quality and alignment can be cleanly factored out has been questioned by Eliezer Yudkowsky. This leaves Speed as the other variable to explore.
Intuition pump
Imagine a scenario where a plant, through incredibly slow biological processes, outputs a scientific theory comparable to Einstein's Theory of Relativity, but over a span of 30-40 years. This is presumably 3x-4x slower than Einstein, who took about 10 years to develop his theory. Despite the slow pace, the quality of the output would be unquestionably high.
By maintaining high-quality outputs while significantly reducing speed, we enable ourselves to evaluate these outputs of a superintelligent AI at a human-comprehensible pace. This approach removes the risk of a rapid, uncontrollable intelligence explosion (or "foom").
Where this might fit in the alignment strategy landscape
Many creative approaches to AI alignment have been proposed over the years. Some of them, like mechanistic interpretability ("mech interp"), can be considered building blocks instead of total solutions. Others, like scalable oversight, address the whole problem, but arguably side-step the crux of it, which notably, according to MIRI, is the development of a fully rigorous Agent Foundations theory.
This is probably most similar to a scalable oversight strategy. One idea is to utilize this as a sort of defense-in-depth, where the slower substrate is layered on top of other alignment strategies to decrease the probability of a sharp left turn.
Another potentially big idea would be to substitute this approach (varying intelligence speed) for the approach in Christiano's iterated distillation and amplification proposal (varying number of agents), which would turn this into a possible full solution and would sidestep Eliezer Yudkowsky's main gripe with IDA.
Isn't this just a boxed AI?
No, a boxed ASI remains superintelligent and capable of outwitting its constraints (human or otherwise) to escape. A slowed AI, on the other hand, while logically advancing, operates at a pace much slower than human thought, allowing us to monitor and intervene as necessary.
Possible computing substrates
Current silicon-based computers, even old ones, often think faster than humans in specific domains (e.g., calculators). We need substrates inherently slower than human cognition. Potential candidates include:
But how do we build it?
Training an AI on a slow substrate might present unique challenges. Initially, we might need to bootstrap the training process on modern, fast hardware before transferring the developed AI to a slower substrate to finalize its development. This phased approach could ensure efficient and effective training while mitigating the risks associated with training and evaluating powerful models.
tl;dr
The main idea is that by adopting slower computing substrates for AI, we could create a controllable environment to develop and evaluate superintelligence.
I wrote this post in order to wonder-out-loud; my hopes are that somebody has thought of this and can point me to rebuttals or previous explorations on this topic. Perhaps I've missed something obvious. Let me hear it.