You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Lumifer comments on Approximating Solomonoff Induction - Less Wrong Discussion

6 Post author: Houshalter 29 May 2015 12:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (45)

You are viewing a single comment's thread. Show more comments above.

Comment author: Lumifer 02 June 2015 02:48:35AM 0 points [-]

It's true that it's only possible to find local optima, but that's true with any algorithm.

Whaaaat? Exhaustive search is an algorithm, it will find you the global optimum anywhere. For many structures of the search space it's not hard to find the global optimum with appropriate algorithms.

Everything approximates Bayesian inference, it's just a matter of how ideal the approximation is. If you have enough data, the maximum liklihood approaches bayesian inference.

Huh?

Comment author: ThisSpaceAvailable 02 June 2015 04:00:44AM 1 point [-]

You can't do an exhaustive search on an infinite set.

Comment author: Lumifer 02 June 2015 04:57:34AM 0 points [-]

You can't do an exhaustive search on an infinite set.

I haven't seen any infinite sets in reality.

Comment author: ThisSpaceAvailable 04 June 2015 10:30:42PM -1 points [-]

The set of possible Turing Machines is infinite. Whether you consider that to satisfy your personal definition of "seen" or "in reality" isn't really relevant.

Comment author: Houshalter 02 June 2015 07:33:43AM *  0 points [-]

In that sense stochastic gradient descent will also find the global optimum, since the randomness will eventually push it to every point possible. It will just take the eternity of the universe, but so will exhaustive search.

It's also trivial to modify any local algorithm to be global, by occasionally moving around randomly. This is also effective in practice, at finding better local optima.

Everything approximates Bayesian inference, it's just a matter of how ideal the approximation is. If you have enough data, the maximum likelihood approaches bayesian inference.

Huh?

There is a view that everything that works must be an approximation of the ideal Bayesian method. This is argued by Yudkowsky in Beautiful Probability and Searching for Bayes-Structure.

I used Maximum likelihood as an example, which is where you take the most probable hypothesis (parameters in a statistical model.) Instead of weighing many hypotheses the bayesian way. If you have enough data, the most probable hypothesis should converge to the correct one.

Comment author: Lumifer 02 June 2015 04:09:33PM 1 point [-]

There is a view that everything that works must be an approximation of the ideal Bayesian method.

You can reformulate many problems in the Bayesian framework. This does not mean that everything is an approximation of Bayesianism -- just like the ability to translate a novel into French does not mean that each novel is an approximation of a French roman.

Comment author: Houshalter 03 June 2015 04:28:14AM 0 points [-]

It's deeper than that. Bayesian probability theory is a mathematical law. Anything method that works must be computing an approximation of it. Just like Newtonian mechanics is a very close approximation of relativity. But they are not equivalent.

Comment author: Lumifer 03 June 2015 02:47:34PM 2 points [-]

Bayesian probability theory is a mathematical law.

That is not true. The Bayes equation is mathematically correct. A theory is much wider -- for example, Bayesians interpret probability as a degree of belief -- is that also a mathematical law? You need a prior to start -- what does the "mathematical law" say about priors?

Comment author: anon85 02 June 2015 09:07:11AM 0 points [-]

Tell me, did Eliezer even address PAC learning in his writing? If not, I would say that he's being over-confident and ignorant in stating that Bayesian probability is all there is and everything else is a mere approximation.

Comment author: Manfred 03 June 2015 02:39:03AM 0 points [-]

PAC-learning is definitely something we don't talk about enough around here, but I don't see what the conflict is with it being an approximation of Bayesian updating.

Here's how I see it: You're updating (approximately) over a limited space of hypotheses that might not contain the true hypothesis, and then this idea that the best model in your space can still be approximately correct is expressible both on Bayesian and on frequentiist grounds (the approximate update over models being equivalent to an approximate update over predictions when you expect the universe to be modelable, and also the best model having a good frequency of success over the long run if the real universe is drawn from a sufficiently nice distribution).

But I'm definitely a n00b at this stuff, so if you have other ideas (and reading recommendations) I'd be happy to hear them.

Comment author: anon85 03 June 2015 03:29:27AM *  1 point [-]

Here's how I see it: You're updating (approximately) over a limited space of hypotheses that might not contain the true hypothesis, and then this idea that the best model in your space can still be approximately correct is expressible both on Bayesian and on frequentiist grounds (the approximate update over models being equivalent to an approximate update over predictions when you expect the universe to be modelable, and also the best model having a good frequency of success over the long run if the real universe is drawn from a sufficiently nice distribution).

The "update" doesn't use Bayes's rule; there's no prior; there's no concept of belief. Why should we still consider it Bayesian? I mean, if you consider any learning to be an approximation of Bayesian updating, then sure, PAC-learning qualifies. But that begs the question, doesn't it?