Parallel distributed processing (as well as "connectionism") is just an early name for the line of work that was eventually rebranded as "deep learning". They're the same research program.
But it's also an entire School of Thought in Cognitive Science. I feel like DL is the method, but without the understanding that these are based on well-thoughtout, mechanistic rules for how cognition fundamentally works, building potentially toward a unified theory of cognition and behaviour.
My uninformed impression was that it had essentially been superceded by deep learning, SGD, etc., so most of the parts relevant to modern AI are already contained in the standard deep learning curriculum. What’s the most important PDP idea that you think is neglected by this curriculum?
My answer is a bit vague, but I would say that the current DL curriculum tells you how these things work, but it doesn't go into the reasoning about cognition that allowed these ideas to exist in the first place.
Haven't read it, but my default guess based on approximately-zero exposure would be that it's one of those "theories" which basically says "hey let's stick lots of simple units together and run simulations on them", while answering basically-zero questions which I actually care about. (For instance: does it tell me how internal structures or activation-patterns in a mental system will map to specific structures in the external environment? Or how to detect the formation of general-purpose search?) Or, a lower bar: does it make any universal quantitative predictions about neural systems at all? (No, "bigger system do better" does not count as quantitative. If it successfully predicted e.g. the parameters of the scaling curves of model nets 20+ years before those nets were built, then I'd definitely pay attention.) I'd be surprised if there's any real model of cognition there at all, as opposed to somebody just vibing that neural-network-style stuff is vaguely good somehow.
So to answer your question: because I have those expectations, I haven't looked into the topic. If my expectations are wrong, then maybe that's a mistake on my part.
I don't have an adequate answer for this, since these models are incomplete. But the way I see it is that these people had a certain way of mathematically reasoning about cognition (Hinton, Rumelhart, McClelland, Smolensky), and that reasoning created most of the breakthroughs we see today in AI (backprop, multi-layed models, etc.) It seems trying to utilize that model of cognition could give rise to new insights about the questions you're asking, attack the problem from a different angle, or help create a grounded paradigm for alignment research to build on.
For those who haven’t heard of PDP, what in your opinion was its most impressive advance prediction that was not predicted by other theories?
You could say it "predicted" everything post-AlexNet, but it's more that it created the fundamental understanding for everything post-AlexNet to exist in the first place. It's the mathematical models of cognition that all of modern AI is built on. This is how we got back propagation, "hidden" layers, etc.
Why haven't more people who work on alignment read Parallel Distributed Processing or even seem at all familiar with Rumelhart's work? This is the fundamental model of cognition and behaviour that all of modern AI is built on, the work that Hinton used for most of his insights. The model that has constantly been validated over the years by neural network performance and capabilities, but most seem incredibly unfamiliar or disinterested in it. Am I missing something? Is there some grounded reason why they think PDP and connectionism will fail at a certain point? It seems this should be required reading for anyone wanting to get into alignment research.