For simplicity I'm assuming the activation functions are the step function h(x)=[x>0]…
For ‘backpropagation’ pretend the derivative of this step function is a positive number (A). A=1 being the most obvious choice.
I would also try reverse Hebbian learning ie give the model random input & apply the rule in reverse
“expanding an architecture that works well with one hidden layer and a given learning rule to an architecture with many hidden layers but the same rule universally decreased performance” -- personally I don't find this surprising
NB for h only relative weight matters eg h(5-x+y) = h(0.5-(x-y)/10) so weights going to extreme values effectively decreases the temperature & L1 & L2 penalties may have odd effect
I have read several papers on this and covered the standard stuff about STDP, BCM, other forms of Hebbian learning, and recently a paper about how local rules are equivalent to minimizing error between vectors encoded in populations. I have tried to implement these in my own code with some success but not as I would like. My specific questions are:
Thank you for any information on these. Information on spiking backpropagation is interesting and I would like to hear it but learning about that is not my primary goal. Any advice related to me generally preparing for applications to computational neuroscience programs is also welcome. I feel extremely ignorant about the field but also like I could make some contributions given my background is in physics and my current job is in systems neuroscience.