The original sparse coding paper[1] in 1997 - major early advance in learned features for vision and also neuroscience; significant downstream influence on later DL.
Also I can see from the google scholar page for Juergen Schmidhuber that you are missing some of his lab's papers that fit your criteria - such as "Gradient flow in recurrent nets". If he were here he would hate that. Schmidhuber claims that much of the key ideas in DL were discovered at his lab in 1990-1991. Even if that seems like a stretch, I do think they early explored a wide range of foundational ideas that only became more important over time: vanishing gradients, distillation and compression, memory/attention, metalearning, artificial curiosity, and more.
Olshausen, Bruno A., and David J. Field. "Sparse coding with an overcomplete basis set: A strategy employed by V1?." Vision research 37.23 (1997): 3311-3325. ↩︎
One the most important deployed applications of machine learning at this point would be web search, so papers relating to that (PageRank, etc) would probably score highly.
I'd expect some papers in spam filtering (which was pretty important / interesting as a machine learning topic at the time) to maybe meet the threshold.
TD-Gammon would probably qualify in the world of RL https://en.wikipedia.org/wiki/TD-Gammon
DistBelief just barely predates that, and since it's basically directly in the lineage to modern deep learning, I think might qualify https://en.wikipedia.org/wiki/TensorFlow#DistBelief
Leo Gao's 2010's summary post has some citations that directly qualify https://bmk.sh/2019/12/31/The-Decade-of-Deep-Learning/
Lastly for now, I think Kevin Murphy's excellent 2012 machine learning textbook: "Machine Learning: A Probabilistic Perspective" has a ton of sidebars and sections for applied machine learning systems, and would probably worth going through.
Edit to add: this source might also be useful https://mlstory.org/
Eurisko from Douglas Lenat in 1976 (before he started Cyc).
Are you asking exclusively about "Machine Learning" systems or also GOFAI? E.g. I notice that you didn't include ELIZA in your database, but that was a hard coded program so maybe doesn't match your criteria.
There has to be a learning component to it.
Bayesian Networks learned from data, SVMs and n-grams all count, but hard coded programs like ELIZA do not
I am the main coordinator of a group studying trends in Machine Learning.
Together, we are building a public dataset annotating milestone ML systems between 1950 and today. You can read more about our work here.
Our curation of papers is quite sparse before 2012, so we are keen on more suggestions of papers before then!
This will be very useful, specially in the context of understanding how big of a deal was the advent of Deep Learning.
The papers we are interested are those which fall in at least one of this categories:
Any suggestions?