At one point he echoes concerns about future systems based on deep learning that sound faintly similar to those expressed in the Rocket Alignment Problem.
The quoted paragraph does not sound like the Rocket Alignment problem to me. It seems to me that the quoted paragraph is arguing that you need to have systems that are robust, whereas the Rocket Alignment problem argues that you need to have a deep understanding of the systems you build. These are very different: I suspect the vast majority of AI safety researchers would agree that you need robustness, but you can get robustness without understanding, e.g. I feel pretty confident that AlphaZero robustly beats humans at Go, even though I don't understand what sort of reasoning AlphaZero is doing.
(A counterargument is that we understand how the AlphaZero training algorithm incentivizes robust gameplay, which is what rocket alignment is talking about, but then it's not clear to me why the rocket alignment analogy implies that we couldn't ever build aligned AI systems out of deep learning.)
To clarify, I had first read the "the whole point of having knowledge" sentence in light of the fact that he wants to hardcode knowledge into our systems, and from that point of view it made more sense. I am re-reading and it's not the best comparison admittedly. The rest of the paper still echoes the general vibe of not doing random searches for answers, and leveraging our human understanding to yield some sort of robustness.
A team of people including Smolensky and Schmidhuber have produced better results on a mathematics problem set by combining BERT with a tensor products (Smolensky et al., 2016), a formal system for representing symbolic variables and their bindings (Schlag et al., 2019), creating a new system called TP-Transformer.
Notable that the latter paper was rejected from ICLR 2020, partly for unfair comparison. It seems unclear at present whether TP-Transformer is better than the baseline transformer.
Gary Marcus is a pretty controversial figure within machine learning. He's become widely known for being the most outspoken critic of the current paradigm of deep learning, arguing that it will be insufficient to yield AGI. Two years ago he came out with Deep Learning: A Critical Appraisal, which to my knowledge remains the most in-depth critique of deep learning, and is cited heavily in AI Impact's post about whether current techniques will lead to human-level AI.
Now, by releasing this new paper, he answers a frequent objection from deep learning specialists that all he does is critique rather than adding anything new.
Gary takes a stand against what he sees as an undue focus on leveraging ever larger amounts of computation to aid in learning, most notably argued for by Richard Sutton. He sees much more promise in instilling human knowledge and using built-in causal and symbolic reasoning, defying what is likely a consensus in the field.
I personally find his approach interesting because of how it might lead to AI systems that are more transparent and align-able, though I'm fairly confident that his analysis will have no significant impact on the field.
I'll provide a few quotes from the article and explain the context. Here, he outlines his agenda.
Marcus is aware of the popular critique that symbolic manipulation was tried and didn't yield AGI, but he thinks that this is a false critique. He thinks it's likely that the brain uses both principles from deep learning and symbolic manipulation.
Another reason why people think that symbolic manipulation won't work is because it's not scalable. It requires hundreds of hours to encode knowledge into our systems, and within years a system based on learning will outperform it anyway. So why bother? Marcus responds
This response puzzles me since I imagine most would consider neurosymbolic systems like the ones he cites (and SATNet) to fit quite nicely into the current deep learning paradigm. Nevertheless, he thinks the field is not pushing sufficiently far in that direction.
One reason that many have given for believing that deep learning will yield AGI is that the brain doesn't have many built in priors, or innate knowledge, and thus building innate knowledge into our systems is not necessary. He thinks that the human brain does use built-in knowledge.
Many have said that the critique Marcus gave in 2018 is outdated since recent natural language models are much more impressive than he predicted. He provides many paragraphs in the paper explaining why he thinks the current Transformer model is not as good as people say it is.
At one point he echoes concerns about future systems based on deep learning that sound faintly similar to those expressed in the Rocket Alignment Problem. [EDIT: eh, probably not that similar]