jacob_cannell comments on Open Thread, Jun. 29 - Jul. 5, 2015 - Less Wrong

5 Post author: Gondolinian 29 June 2015 12:14AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (210)

You are viewing a single comment's thread. Show more comments above.

Comment author: MrMind 30 June 2015 09:07:19AM *  0 points [-]

I'm very tempted to argue that it is!
But what I wanted to convey is that it feels like I'm supposed to learn something which is manifestly inferior, in its logical foundation, than what is already known and available.

And maybe under the constraint of computational cost the finishing point of the Bayesian and the frequentist approach is the same, but where's the proof? Where's the place where someone says: "This is Bayesian machine learning, but it's computationally too costly. So by making this and this simplifying assumptions, we end up with frequentist machine learning."?

Instead, what I read are things like: "In practice, Bayesian optimization has been shown to obtain better results in fewer experiments than grid search and random search" (from here).

Comment author: jacob_cannell 02 July 2015 12:28:10AM 2 points [-]

There is the probabilistic programming community which uses clean tools (programming languages) to hand construct models with many unknown parameters. They use approximate bayesian methods for inference, and they are slowly improving the efficiency/scalability of those techniques.

Then there is the neural net & optimization community which uses general automated models. It is more 'frequentist' (or perhaps just ad-hoc ), but there are also now some bayesian inroads there. That community has the most efficient/scalable learning methods, but it isn't always clear what tradeoffs they are making.

And even in the ANN world, you sometimes see bayesian statistics brought in to justify regularizers or to derive stuff - such as in variational methods. But then for actual learning they take gradients and use SGD, with the understanding that SGD is somehow approximating the bayesian inference step, or at least doing something close enough.