I believe that fast takeoff is impossible, because of computational complexity.
This post presents a pretty clear summary of my thoughts. Essentially, if the problem of “designing an AI with intelligence level n” scales at any rate greater than linear, this will counteract any benefit an AI received from its increased intelligence, and so its intelligence will converge. I would like to see a more formal model of this.
I am aware that Gwern has responded to this argument, but I feel like he missed the main point. He gives many arguments showing ability to solve an NP-complete problem in polynomial time, or still do better than a human, or still gain a benefit from performing mildly better than a human.
But the concern here is really about A.I.s performing better than humans at certain tasks. It’s about them rapidly, and recursively, ascending to godlike levels of intelligence. That’s what is being argued is impossible. And there’s a difference between “superhuman at all tasks” and “godlike intelligence enabling what seems like magic to lesser minds”.
One of Eliezer’s favorite examples of how a powerful AI might take over the world is by solving the protein folding problem, designing some nano-machines, and using an online service to have them constructed. The problem with this scenario is the part where the AI “solves the protein folding problem”. That the problem is NP-hard means that it will be difficult, no matter how intelligent the AI is.
Something I’ve felt confused about, as I’ve been writing this up, is this problem of “what is the computational complexity of designing an AI with intelligence level N?” I have an intuition that there should be some “best architecture”, at least for any given environment, and that this architecture should be relatively “simple”. And once this architecture has been discovered, that’s pretty much it for self-improvement-you can still benefit from acquiring extra resources and improving your hardware, but these have diminishing returns. The alternative to this is that there’s an infinite hierarchy of increasingly good AI designs, which seems implausible to me. (Though there is an infinite hierarchy of increasingly large numbers, so maybe it isn’t.)
Now, it could be that even without fast takeoff, AGI is still a threat. But this is a different argument than "as soon as artificial intelligence surpasses human intelligence, recursive self-improvement will take place, creating an entity we can't hope to comprehend, let alone oppose."
Edit: Here is a discussion involving Yudkowsky regarding some of these issues.
Yudkowsky addresses some of these objections in more detail in "Intelligence Explosion Microeconomics".
Thanks. I had skimmed that paper before, but my impression was that it only briefly acknowledged my main objection regarding computational complexity on page 4. Most of the paper involves analogies with evolution/civilization which I don't think are very useful-my argument is that the difficulty of designing intelligence should grow exponentially at high levels, so the difficulty of relatively low-difficulty tasks like designing human intelligence doesn't seem that important.
On page 35, Eliezer writes:
I am not aware of anyone who has defended an “intelligence fizzle” seriously and at great length.
I will read it again more thoroughly, and see if there's anything I missed.
Firstly, if we are talking actual computational complexity, then the mathematical background is already implicitly talking about the fastest possible algorithm to do X.
That the problem is NP-hard means that it will be difficult, no matter how intelligent the AI is.
Whether or not P=NP is an unsolved problem.
Predicting how a protien will fold is in BQP, which might be easier than NP. (another unsolved problem)
Computational complexity classes often don't matter in practice. If you are solving the travelling salesman problem, you rarely need the shortest path, a short path is good enough. Secondly, the P vs NP is worst case. There are some special cases of the travelling salesman that are easy to solve. Taking an arbitrary protein and predicting how it will fold might be computationally intractable, but the work here is done by the word "arbitrary". There are some protiens that are really hard to predict, and some that are easier. Can molecular nanotech be made using only the easily predicted protiens?
(Also, an algorithm doesn't have an intelligence level, it has an intelligence to compute relation. Once you have invented minmax, increasing the recursion depth takes next to no insight into intelligence. Given a googleplex flop computer, your argument obviously fails, because any fool could bootstrap intelligence on that.)
I have an intuition that there should be some “best architecture”, at least for any given environment, and that this architecture should be relatively “simple”.
I agree. I think that AIXI, shows that there is a simple optimal design with unlimited compute. There being no simple optimal design with finite compute would be somewhat surprising. (I think logical induction is something like only exponentially worse than any possible mathematical reasoner in use of compute)
But this is a different argument than "as soon as artificial intelligence surpasses human intelligence, recursive self-improvement will take place, creating an entity we can't hope to comprehend, let alone oppose."
A model in which both are true. Suppose that there was a design of AI that was optimal for its compute. And suppose this design was reasonably findable, ie a bunch of smart humans could find this design with effort. And suppose this design was really, really smart.
(Humans often get the answer wrong even on the cases where the exact maths takes a trivial amount of compute, like the doctors with a disease that has prevalence 1 in 1000, and the 90% reliable test) The gap between humans and optimal use of compute is likely huge.
So either humans figure out the optimal, and implement it. Or humans hack together something near human level. The near human level AI might fiddle with its own workings, trying to increase its intelligence, and then it figures out the optimal design.
In this world the prediction that vastly superhuman AI arrives not long after AI reaches human level. Its just that the self improvement isn't that recursive.
Regarding your "intuition that there should be some “best architecture”, at least for any given environment, and that this architecture should be relatively “simple”.", I think:
1) I'd say "task" rather than "environment", unless I wanted to emphasize that I think selection pressure trumps the orthogonality thesis (I'm ambivalent, FWIW).
2) I don't see why it should be "simple" (and relative to what?) in every case, but I sort of share this intuition for most cases...
3) On the other hand, I think any system with other agents probably is much more complicated (IIUC, a lot of people think social complexity drove selection pressure for human-level intelligence in a feedback loop). At a "gears level" the reason this creates an insatiable drive for greater complexity is that social dynamics can be quite winner-takes-all... if you're one step ahead of everyone else (and they don't realize it), then you can fleece them.
I don't think asymptotic reasoning is really the right tool for the job here.
We *know* things level off eventually because of physical limits (https://en.wikipedia.org/wiki/Limits_of_computation).
But fast takeoff is about how fast we go from where we are now to (e.g.) a superintelligence with a decisive strategic advantage (DSA). DSA probably doesn't require something near the physical limits of computation.
The most "optimistic" model in that post is linear. That is a model where making something as smart as you is a task of fixed difficulty. The benifit you gain by being smarter counterbalances the extra difficulty of making something smarter. In all the other models, making something as smart as you gets harder as you get smarter. (I am not talking about biological reproduction here, or about an AI blindly copying itself, I am talking about writing code that is as smart as you are from scratch).
Suppose we gave a chicken and a human access to the same computer, and asked each to program something at least as smart as they were. I would expect the human to do better than the chicken. Likewise I would bet on a team of IQ 120 humans producing an AI smarter than they are over a team of IQ 80 humans producing something smarter than they are. (Or anything, smarter than a chicken really).
A few extra IQ points will make you a slightly better chessplayer, but is the difference between not inventing minmax and not being able to write a chess playing program at all, and inventing minmax.
Making things smarter than yourself gets much easier as you get smarter, which is why only smart humans have a serious chance of managing it.
Instead of linear, try squareroot, or log.
The problem with this scenario is the part where the AI “solves the protein folding problem”. That the problem is NP-hard means that it will be difficult, no matter how intelligent the AI is.
There's no good reason to assume protein folding to be NP-hard. DeepMind seems to make good progress on it.
I wrote once about levels of AI self-improvement and come to a similar conclusion: any more advance version of such AI will require more and more extensive testing, to ensure its stability and alignment, and the complexity of the testing task will grow very quickly, thus slowing down any intelligent explosion. This, however, doesn't preclude creation of Dangerous AI (capable to solve the task of human extinction and just slight superhuman in some domains).
Are language models utility maximizes?
I think there's two different major "phases" of a language model, training and runtime. During training, the model is getting "steered" toward some objective function - first getting the probability of the next token "right", and then getting positive feedback from humans during rlhf (I think? I should read exactly how rlhf works). Is this utility maximization? It doesn't feel like it - I think I'll put my thoughts on this in another comment.
During runtime, at first glance, the model is kind of "deterministic" (wrong word), in that it's "just multiplying matrices", but maybe it "learned" some utility maximizers during training and they're embedded within it. Not sure if this is actually possible, or if it happens in practice, and if the utility maximizers are dominate the agent or can be "overruled" by other parts of it.
Decision theory likes to put its foot down on a particular preference, and then ask what follows. During inference, a pre-trained model seems to be encoding something that can loosely be thought of as (situation, objective) pairs. The embeddings it computes (residual stream somewhere in the middle) is a good representation of the situation for the purpose of pursuing the objective, and solves part of the problem of general intelligence (being able to pursue ~arbitrary objectives allows pursuing ~arbitrary instrumental objectives). Fine-tuning can then essentially do interpretability to the embeddings to find the next action useful for pusuing the objective in the situation. System prompt fine-tuning makes specification of objectives more explicit.
This plurality of objectives is unlike having a specific preference, but perhaps there is some "universal utility" of being a simulator that seeks to solve arbitrary decision problems given by (situation, objective) pairs, and to take intentional stance on situations that don't have an objective explicitly pointed out, eliciting an objective that fits the situation, and then pursuing that. With an objective found in environment, this is similar to one of the things corrigibility does, adopting preference that's not originally part of the agent. And if elicitation of objectives for a situation can be made pseudokind, this line of thought might clarify the intuition that the concept of pseudokindness/respect-for-boundaries has some naturality to it, rather than being purely a psychological artifact of desperate search for rationalizations that would argue possibility of humanity's survival.
Is a language model performing utility maximization during training?
Let's ignore RLHF for now and just focus on next token prediction. There's an argument that, of course the LM is maximizing a utility function - namely it's log score on predicting the next token, over the distribution of all text on the internet (or whatever it was trained on). An immediate reaction I have to this is that this isn't really what we want, even ignoring that we want the text to be useful (as most internet text isn't).
This is clearly related to all the problems around overfitting. My understanding is that in practice, this is solved through a combination of regularization, and stopping training once test loss stops decreasing. So even if a language model was a UM during training, we already have some guardrails on it. Are they enough?
What exactly is being done - what type of thing is being created - when we run a process like "use gradient descent to minimize a loss function on training data, as long as the loss function is also being minimized on test data"?
There's sort of a general "digestive issues are the root of all anxiety/evil" thread I've seen pop up in a bunch of rationalist-adjacent spaces:
I'm curious if there's any synthesis / study / general theory of this.
As my own datapoint, I've had pretty bad digestive issues, trouble eating, and ahedonia for a while. I recently got to do a natural experiment when I accidentally left my milk out. Since I've cut milk, I've felt much better (though not perfect) on all counts. So I'm probably lactose intolerant (though I never really noticed a correlation between my symptoms and milk consumption).
Probably worth checking if there are any easy fixes to your digestive issues if you have them.