"Only in AI would people design algorithms that are literally stupider than a bag of bricks, boost the results back towards maximum entropy, and then argue for the healing power of noise."
I do not have the time to go through it now (which probably means I never will remember to do it) but I can offer a small observation.
When training neural networks, there is a very good reason why adding a random element improves the performance: it avoids getting stuck in suboptimal local minima. Training a network can be seen as minimizing errors on a surface in weight-space. This surface usually is littered with local minima of various sizes, so a deterministic training rule gets stuck while a stochastic one can get kicked out of them. Of course, one has to be careful not to add too much of a random element; this is usually done by using small steps in the training.
I do not know if this adds anything as once the training is complete, the net constitutes an algorithm that is deterministic. The point however is that optimization methods that (necessarily) rely on local information usually performs better with an element of noise.
"Only in AI would people design algorithms that are literally stupider than a bag of bricks, boost the results back towards maximum entropy, and then argue for the healing power of noise."
I do not have the time to go through it now (which probably means I never will remember to do it) but I can offer a small observation.
When training neural networks, there is a very good reason why adding a random element improves the performance: it avoids getting stuck in suboptimal local minima. Training a network can be seen as minimizing errors on a surface in weight-space. This surface usually is littered with local minima of various sizes, so a deterministic training rule gets stuck while a stochastic one can get kicked out of them. Of course, one has to be careful not to add too much of a random element; this is usually done by using small steps in the training.
I do not know if this adds anything as once the training is complete, the net constitutes an algorithm that is deterministic. The point however is that optimization methods that (necessarily) rely on local information usually performs better with an element of noise.