Lumifer comments on Approximating Solomonoff Induction - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (45)
In that sense stochastic gradient descent will also find the global optimum, since the randomness will eventually push it to every point possible. It will just take the eternity of the universe, but so will exhaustive search.
It's also trivial to modify any local algorithm to be global, by occasionally moving around randomly. This is also effective in practice, at finding better local optima.
There is a view that everything that works must be an approximation of the ideal Bayesian method. This is argued by Yudkowsky in Beautiful Probability and Searching for Bayes-Structure.
I used Maximum likelihood as an example, which is where you take the most probable hypothesis (parameters in a statistical model.) Instead of weighing many hypotheses the bayesian way. If you have enough data, the most probable hypothesis should converge to the correct one.
You can reformulate many problems in the Bayesian framework. This does not mean that everything is an approximation of Bayesianism -- just like the ability to translate a novel into French does not mean that each novel is an approximation of a French roman.
It's deeper than that. Bayesian probability theory is a mathematical law. Anything method that works must be computing an approximation of it. Just like Newtonian mechanics is a very close approximation of relativity. But they are not equivalent.
That is not true. The Bayes equation is mathematically correct. A theory is much wider -- for example, Bayesians interpret probability as a degree of belief -- is that also a mathematical law? You need a prior to start -- what does the "mathematical law" say about priors?