PhilGoetz comments on Connectionism: Modeling the mind with neural networks - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (20)
Reward comes along too much later for this to work for humans. Instead, the brain uses temporal difference learning. I no longer remember what was the first, classic paper demonstrating temporal difference error signals in the brain; it may have been A Neural Substrate of Prediction and Reward (1997). Google ("temporal difference learning", brain). "Temporal Difference Models and Reward-Related Learning in the Human Brain" , Neuron, 2003, will be one of the hits.
I agree that the brain uses temporal difference learning. I thought temporal difference learning was that reward propagates back to earliest reliable stimulus based on difference between expected and observed, then reinforces it. How is that different from the quoted text except that quoted is simpler and doesn't use that language?