Continuous learning is the capacity to keep adding to long-term memory as you go, and this would allow a language model to tackle much longer texts.
Cerebras are saying they can handle 50000 token context windows. That's about 30K-40K words, the amount one might type in a day, typing quickly and without rest. Or half a short novel.
This sort of context window makes improvement to short term memory largely unnecessary, as running within a single context window instantiates day-long spurs (temporary instances of human imitations whose detailed short experiences are to be forgotten), or bureaucracies of such spurs. Also, speaking internal monologues into the context window to reason out complicated arguments lifts any bounds one-step token prediction might place on them. If a bureaucracy were to prepare a report, it could be added to the next batch of sequence prediction learning, improving the model's capabilities or alignment properties it was intended to improve.
So all that remains is some fine tuning, hopefully with conditioning and not RLHF.
GPT-3- a text-generating language model.
PaLM-540B- a stunningly powerful question-answering language model.
Great Palm- A hypothetical language model that combines the powers of GPT-3 and PaLM-540B.
I would've thought that palm was better at text generation then gpt-3 by default. They're both pretrained on internet next-word prediction and palm is bigger with more data. What makes you think GPT-3 is better at text generation?
I'm puzzled by this as well. For a moment I thought maybe PaLM used an encoder-decoder architecture, but no it uses next-word prediction just like GPT-3. Not sure what GPT-3 has that PaLM lacks. A model with the parameter count of PaLM and training dateset size of Chinchilla would be a better hypothetical for "Great Palm".
I have independently come to much the same conclusions, with some different details about what I think the missing pieces are. I think we are on the brink of a generality threshold sufficient for enabling recursive self improvement that accelerates (fooms) rather than decelerates towards a nearby asymptote (fizzles). I've been trying to convince people of this, but I feel like my voice alone has been insufficient to change minds much. I'm glad others are also noticing and speaking out about this.
Maybe we should think explicitly about what work is done by the concept of AGI, but I do not feel like calling GPT an AGI does anything interesting to my world model. Should I expect ChatGPT to beat me at chess? It's next version? If not - is it due to shortage of data or compute? Will it take over the world? If not - may I conclude that the next AGI wouldn't?
I understand why the bar-shifting thing look like motivated reasoning, and probably most of it actually is, but it deserves much more credit that you give it. We have an undefined concept of "something with virtually all the cognitive abilities of a human, that can therefore do whatever a human can", and some dubious assumptions like "if it can sensibly talk about everything, it can probably understand everything". Than we encounter ChatGPT, and it is amazing at speaking, except giving a strong impression of talking to an NPC. NPC who know lots of stuff and can even sort-of-reason in very constrained ways, do basic programming and be "creative" as in writing poetry - but is sub-human at things like gathering useful information, inferring people's goals, etc. So we conclude that some cognitive ability is still missing, and try to think how to correct for that.
Now, I do not care to call GPT an AGI, but you will have to invent a name for the super-AGI things that we try to achieve next, and know to be possible because humans exist.
Thanks for sharing your thoughts @philosophybear. I found it helpful to interact with your thoughts. Here are a couple of comments.
I think the Great Palm lacks only one thing, the capacity for continuous learning- the capacity to remember the important bits of everything it reads, and not just in its training period. If Great Palm (GPT-3+PaLM540B) had that ability, it would be an AGI.
Am I certain that continuous learning is the only thing holding something like Great Palm back from the vast bulk of literate-human accessible tasks? No, I’m not certain. I’m very open to counterexamples if you have any, put them in the comments. Nonetheless, PaLM can do a lot of things, GPT-3 can do a lot of things, and when you put them together, the only things that stand out to me as obviously and qualitatively missing in the domain of text input, and text output involve continuous learning
But to me, these aren’t really definitions of AGI. They’re definitions of visual, auditory and kinaesthetic sensory modality utilizing AGI. Putting this as the bar for AGI effectively excludes some disabled people from being general intelligences, which is not desirable!
I generally agree with this, primarily because I believe Jacob Cannell's timelines, and I believe that AI is progressing continuously, without major discontinuities in either direction.
I think the biggest piece of an actual GI that is missing from text extenders is agency Responding to prompts and answering questions is one thing, but deciding what to do/write about next isn't even a theoretical part of thier functionality.
I’m puzzled by the apparent tension between upvoting importance of continuous learning on one hand and downvoting agreement with agency on the other hand. When transformers produce something that sounds not from humans, it’s usually because of consistency mistakes (like telling at length that it can’t speak danish… in well formed danish sentences). Maybe it’s true that continuous learning can solve the problem (if that includes learning from its own response maybe?). But wouldn’t we perceived that as exhibiting agency?
That doesn't seem like it would be a problem if it was connected to something where people constantly interacted with it. Then the model's actions would be outputted constantly, and it seems like there would be no important difference between that and it acting unprompted (heh).
The physical world is also acting continuously based on inputs it receives from people, and we don't say "The Earth" is an intelligence.
That's true. Earth doesn't act like an intelligent agent, but a model could. A current model could simulate the verbal output of a human, and that output could be connected to some actuators (or biological humans) that would allow it to act in the world. Also, Earth can't comprehend new concepts, correctly apply them and solve problems.
I was thinking along similar lines. I note that someone with amnesia probably remains generally intelligent, so I am not sure continuous learning is really required.
I’m putting my existing work on AI on Less Wrong, and editing as I go, in preparation to publishing a collection of my works on AI in a free online volume. If this content interests you, you could always follow my Substack, it's free and also under the name Philosophy Bear.
Anyway, enjoy. Comments are appreciated as I will be rewriting parts of the essays before I put them out. A big thank you to user TAG who identified a major error in my previous post regarding the Chinese Room Thought experiment, which prompted its correction [in the addition that will go in the book] and a new corrections section for my Substack page.
Glossary:
GPT-3- a text-generating language model.
PaLM-540B- a stunningly powerful question-answering language model.
Great Palm- A hypothetical language model that combines the powers of GPT-3 and PaLM-540B. Probably buildable with current technology, a lot of money and a little elbow grease.
Great Palm with continuous learning (GPWCL)- A hypothetical language model that combines the capacities of GPT-3 and PaLM-540B, with an important additional capacity. Most language models work over a “window” of text, functioning as short-term memory. Their long-term memory is set by their training. Continuous learning is the capacity to keep adding to long-term memory as you go, and this would allow a language model to tackle much longer texts.
The argument
What I’ll be doing in this short essay is a bit cheeky, but I think we’ll make a few important points, viz:
If I’m being a bit of a gadfly here, it’s not without a purpose.
Everything I say in this article in a sense maybe applies to GPT-3 alone, but for the avoidance of doubt, let me specify that I’m talking about a hypothetical language model that has the fluency of GPT-3 and the question-answering capabilities of PaLM-540B which we will call The Great Palm to make it clear that we’re not taking ourselves too seriously. In my view, The Great Palm is very close to being an AGI.
I think the Great Palm lacks only one thing, the capacity for continuous learning- the capacity to remember the important bits of everything it reads, and not just in its training period. If Great Palm (GPT-3+PaLM540B) had that ability, it would be an AGI.
“But hang on”, you say “Great Palm can’t draw, it can’t play computer games, it can’t listen to music, it can’t so much as discriminate an apple from a banana, and adding on a capacity for continuous learning doesn’t change that”.
I have two responses.
Response 1: Sure, but neither could noted author, activist, and communist intellectual Helen Keller and other completely deaf and blind people, who are all general intellects.
Response 2: Actually, it may be able to do some of these things so long as you can convert them into the modality of text. It’s quite conceivable that Great Palm could analyze music, for example, if the notation were converted into text. We should focus more on content than modality.
Why do I say that Great Palm with a capacity for continuous learning would be an artificial general intelligence? Because it can attempt basically all tasks a human with access to a text input, text output console and nothing more could and make a reasonable go at them. In the case of Great Palm with continuous learning, looking at what PaLM-540B and GPT-3 can do, it’s actually hard to find tasks that the average human can beat it. Look at the MMLU dataset if you don’t believe me- they’re tough questions). That kind of broad scope is comparable to the scope of many humans.
To be clear I am absolutely not saying that, for example, Helen Keller could only answer text input text output problems. There are numerous other sensory modalities-touch taste etc. Helen Keller could navigate a maze, whereas Great-Palm-With-Continuous learning could only do that if the maze were described to it. I suppose this gives a possible line of counterargument. We could disqualify Great-Palm-With-Continuous-Learning by adding a disjunction like “AGIs must be proficient in at least one of touch, taste, smell, sight or hearing”, but that seems arbitrary to me.
I’m not exactly going to proffer a definition of AGI here, but it seems to me that entities that can make a reasonable go at almost all text input text output tasks count as AGIs. At the very least, imposing the need to be able to use particular sensory modalities is not only wrongly human-centric, but it also doesn’t even account for all human experience (e.g. the deaf and blind).
Objections:
What about Commonsense reasoning: Maybe you’re worried about commonsense reasoning. Looking at PaLM’s capabilities, Its performance on commonsense reasoning tasks is human, or very close to it. For example, PaLM 540B scored ~96% on the Winograd Schema test. My recollection is that most humans don’t get this much, but the authors set the bar 100 because they reasoned a human properly paying attention would get full marks [at least I seem to recall that’s why they changed it to 100 between GLUE and superGLUE]. Requiring 100% of human performance on commonsense reasoning tasks to be an AGI seems to me like special pleading. Near enough is good enough to count.
What about the Turing test: Would the Great Palm continuous learning edition be able to pass the Turing test reliably? I don’t know. I’m confident it could pass it sometimes and I’m confident it could pass it more reliably than some humans- humans who are undoubtedly general intelligences. Language models have gotten very good at Turing tests after all.
Surely there are some tasks it cannot do: Is it not possible that there might be some tasks that humans can do that Great Palm with continuous learning (GPWCL) can’t do?: I’d say it’s probable! Nonetheless, the great bulk of tasks an average literate human could do, GPWCL can do- and it’s quite difficult to find counterexamples. I think that insisting that AGI requires a computer to be able to perform literally every task a literate human can do is special pleading. If we encountered aliens, for example, it’s quite likely that there would be some tasks the average human can do that the average alien couldn’t do (and vice versa) this wouldn’t exclude either of us from counting as AGI.
Haven’t you just arbitrarily drawn a line around text input, text output problems and said “being able to do the majority of these is enough for AGI”? Sure, definitions of AGI that exclude the deaf and the blind may be wrong, but that doesn’t prove text alone is sufficient. Maybe some third definition that includes Helen Keller, but excludes Great-Palm-With-Continuous-Learning is right: Ultimately, this will come down to definition debate. However when we focus on the content of problems rather than the modality, it becomes clear the range of text input, text output is vast, one might even say general.
What if there are other huge categories of text input text output tasks that Great Palm with continuous learning could not attempt that you are unaware of: Am I certain that continuous learning is the only thing holding something like Great Palm back from the vast bulk of literate-human accessible tasks? No, I’m not certain. I’m very open to counterexamples if you have any, put them in the comments. Nonetheless, PaLM can do a lot of things, GPT-3 can do a lot of things, and when you put them together, the only things that stand out to me as obviously and qualitatively missing in the domain of text input, and text output involve continuous learning.
Am I saying that text input text output is the only way to prove intelligence?: Absolutely not! The vast majority of humans who ever lived were illiterate. However, it seems general enough to me to qualify. It is sufficient, not necessary.
Aren’t you treating continuous learning as if it were a very easy problem, a negligible barrier when it fact it’s very hard?: That’s not my intention. I recognize that it is very hard. That said, at a guess, it is probably possible to make Great-Palm sans continuous learning now. Adding on the continuous learning component will take time, but I would be very surprised if it took anywhere near as much time as it took us to reach GPT-3 and PaLM-540B.
Implications
Turing proposed the Turing test as a test for something like AGI, but since then it seems the concept of AGI has somewhat metastasized. For example, Metaculus gives this as the requirements to qualify as a “weakly general” AGI:
And this as the definition of a strong AGI on Metaculus:
But to me, these aren’t really definitions of AGI. They’re definitions of visual, auditory and kinaesthetic sensory modality utilizing AGI. Putting this as the bar for AGI effectively excludes some disabled people from being general intelligences, which is not desirable! That alone makes it worth correcting. But it also has another undesirable effect. Adding this onto the concept of intelligence is a form of bar-shifting that prevents us from recognizing our progress. This sort of bar shifting is part of a general pattern of thought that means we keep being taken by surprise by our own achievements in machine learning.
Also, the second set of problems particularly, but to a certain degree the first as well, are much too hard. Almost no human being would pass all of the second set of problems. A solid majority would not past the first set. This also contributes to the bar-shifting problem. But that’s a matter for a different essay.
There’s an old joke in the field that intelligence is whatever it is that we can’t get computers to do at the moment. Let’s try to avoid that!