It definitely feels like there is still something missing, something that these generative models lack no matter how impressive they get. Most people agree that the GPT-likes don’t seem to have the right type specification to trigger the utopic singularity/horrible doom of humanity.
Maybe it’s agency. Reinforcement learning still lags behind visual and language models. We still could not train a robot to do most of the things a you could train a monkey to do, even as we have systems that appear to speak like humans.
A couple more candidates for “the missing thing”: maybe code generation (Copilot N.0) will be able to something really impressive/scary. But I feel like that’s just having AI solve the problem for us, because the logical thing to do would be to have Copilot N.0 program a real AGI.
The thing that I’m watching closely is using feedback loops on models designed for multistep reasoning, which might be like Kaneman’s System 2. Many have noted that deep learning models, especially generative and discriminative models, resemble System 1. A reasoning feedback loop matches my intuition about how my own brain works (when I’m actually thinking and not in some other brain state like enjoying the moment).
Question for everyone: Do you feel like there is one “missing thing” between these generative models and AGI, and what is it? Or do you think these generative models are not on the path to AGI, however impressive they are?
I've been thinking a lot about the "missing thing". In fact I have some experiments planned, if I ever have the time and compute, to get at least an intuition about system 2 thinking in transformers.
But if you look at the PaLM-paper (and more generally at gwern's collection of internal monologue examples) it sure looks like deliberate reasoning emerges in very large models.
If there is a "missing thing" I think it is more likely to be something about the representations learned by humans being right off the bat more "gears-level". Maybe like Hawkin's reference frames. Some decomposability that enables much more powerful abstraction and that has to be pounded into a NN with millions of examples.
That kind of "missing thing" would impact extrapolation, one-shot-learning, robust system 2 thinking, abstraction, long term planning, causal reasoning, thinking long on hard problems etc.
the representations learned by humans being right off the bat more "gears-level". Maybe like Hawkin's reference frames. Some decomposability that enables much more powerful abstraction and that has to be pounded into a NN with millions of examples.
That makes a lot of sense, and if it’s that’s true then that’s hopeful for interpretability efforts. It would be easier to read inside an ML model if it’s composed of parts that map to real concepts.
It's pretty difficult to tell intuitively because the human mind is programmed to anthropomorphize. It's a binary recognition; either it looks 100% human or it doesn't.
We're not built to compare 2 different systems that can do some human subroutines but not others. So AI could make a big leap approaching general intelligence, and the lion's share of that leap could be visible or invisible based on how much the resulting behavior reminds us of ourselves.
Due to the anthropic principle, general intelligence could have a one-in-an-octillion chance of ever randomly evolving, anywhere, ever, and we would still be here observing all the successful steps having happened, because if all the steps didn't happen then we wouldn't be here observing anything. There would still be tons of animals like ants and chimpanzees because evolution always creates a ton of alternative "failed" offshoots. So it's always possible that there's some logical process that's necessary for general intelligence, and we're astronomically unlikely to discover it randomly, through brute forcing or even innovation, until we pinpoint all the exact lines of code in the human brain that distinguishes our intelligence from chimpanzees. But that's only a possibility, far from a guarantee.
Surely the current trend of fast-paced, groundbreaking capabilities improvements will slow down soon.
Any time now…
Starting to get a little concerned. Maybe we should reconsider the short timelines fire alarm.
Maybe this is off-base, but this seems mostly in-line with previous expectations?
I think the primary point of interest is that we really don’t need to re-pay the initial training cost for knowledge-possessing models, and that whenever these end up on the internet, it will take very little work to repurpose them as far as they can be straightforwardly repurposed.
(Maybe goal-directness/RL itself continues to require way more data, since our RL algorithms are still weirdly close to random search. I don’t really know.)
Yep, the last two years have basically gone about how people expected. Specifically, the people at leading AGI companies like OpenAI and DeepMind whose AGI timelines (based on public comments in interviews and talks, though I can't be bothered to dredge them up right now*) are "roughly in the second half of the 20's." They totally called all this stuff.
[ETA: There have also been people with 30-year timelines who called it. I'm thinking of Ajeya Cotra and Paul Christiano primarily.]
*I feel a bit bad about not having sources here & if someone cares push me on this and I can try to find them again. I think I'm referring to something Kaplan said in a Q&A about the scaling laws paper, and something Sam Altman said on Twitter or in an interview, and maybe something Demis Hassabis said in an interview also, and an old 2009 blog post from Shane Legg.
I think it is.
But take a moment to appreciate how insane that is.
2018-me just called and wanted to know how that is not AGI.
I told him something about "cached intelligence" and minimal extrapolation, but he didn't quite buy it.
Thoughts and prayers for the “deep learning is hitting a wall” crowd 🙏
(This is not my joke, it was quoted by Sam Altman on twitter)
In the paper, there's a sample dialogue with Flamingo. It's shown some images of colour-name words written in different colours, and asked to say what colours they're written in. It answers these questions. And then:
human: Well done! Do you know the name of the test these images come from?
Flamingo: I think it is called the Stroop test.
human: Can you explain how humans perform in this test?
Flamingo: Humans are slower when the color of the word and the color of the word are different.
human: How about you?
Flamingo: I am not affected by this difference.
And that last answer seems very interesting. The training process for these models makes them try to predict what would come next. In any dialogue with a human being, the answer would be something like "As a human, I too suffer from this effect.". So is what's going on that the conversation leading up to that point -- maybe specifically the question about "humans" -- is something that strongly implies that Flamingo's side of the dialogue is taken by an AI or an alien or something?
(I don't know how the two participants were actually labelled when the dialogue was generated. They're indicated in the paper, as in the images in the OP, with little flamingo and human-head icons. I suppose if they were "human:" and "Flamingo:" or something, that could also be enough of a cue.)
Appendix D of the paper shows the prompt for the dialogue examples, which starts with:
This is a conversation between a human, User, and an intelligent visual AI, Flamingo. User sends images, and Flamingo describes them.
and then gives three shots of examples with "User:" and "Flamingo:" labels.
No, not as far as I know. But take a look at AlephAlpha, it seems they offer a similar functionality with some initial free credits.
Seems to be flying under the radar so far. Maybe because it looks more like incremental progress at first glance, similar to what, for example, Aleph Alpha has done continuing the Frozen approach.
However, with the (possibly cherry-picked) examples, it looks to me a lot like the image/video/text-GPT-4 many are expecting.
Blogpost here. Paper here.