You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

RichardKennaway comments on Approximating Solomonoff Induction - Less Wrong Discussion

6 Post author: Houshalter 29 May 2015 12:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (45)

You are viewing a single comment's thread. Show more comments above.

Comment author: anon85 01 June 2015 05:52:24PM 1 point [-]

People here seem to really like Solomonoff induction, but I don't think it's all that relevant to learning in practice due to computational complexity.

Solomonoff induction is not computable. Trying to "approximate" it, by coming up with hypotheses similar to it, is probably also not computable.

If you replace Solomonoff induction with induction over programs that halt quickly or induction over circuits, it becomes computable, but it is still NP-hard. Again, approximating this is probably also NP-hard, depending on your definition of approximation.

Next, if you replace boolean circuits with neural nets, it is still hard to find the best neural net to fit the data. MCMC and gradient descent only find local optima. I mean, the fact that neural nets didn't give us strong AI back in the 70s demonstrates that they are not doing anything close to Solomonoff induction.

It's not even clear that a learning program must approximate Bayesian inference. There are things like PAC learning that don't do that at all.

Comment author: RichardKennaway 02 June 2015 10:43:45AM 0 points [-]

It's not even clear that a learning program must approximate Bayesian inference. There are things like PAC learning that don't do that at all.

Can you expand on the relationship between the two? As far as I can see, Bayesian learning starts from a prior distribution, and PAC learning (for PAC-learnable concepts) works independently of the distribution. That is, they're solving average-case vs. worst-case or adversarial problems, which is a subject Eliezer has discussed here. Further discussion, mostly not by Eliezer, here. Those discussions don't mention PAC though.

PAC-Bayesian hybrids appear to also be a thing, but all I know about that is the Google search I just did.

Comment author: anon85 02 June 2015 09:37:31PM 0 points [-]

I suppose that in some sense, the Bayesian-vs-PAC comparison is the learning analogue of the average-case vs. worst-case comparison (where Bayesian=average case, PAC=worst case). It's not a perfect analogy, but in both cases a big question is what if you don't know the distribution/prior?

In the learning version, I think the problem is even more severe - I don't know if an analogue of de-randomization works anymore, and I don't think the "P=BPP" conjecture has an analogue.

The learning version has some additional complications, in that Bayesian inference is computationally intractable in many cases.

Comment author: RichardKennaway 04 June 2015 03:52:20PM *  0 points [-]

what if you don't know the distribution/prior?

I suppose the Bayesian answer to that is that a probability distribution is a description of one's knowledge, and that in principle, every state of knowledge, including total ignorance, can be represented as a prior distribution. In practice, one may not know how to do that. Fundamentalist Bayesians say that that is a weakness in our knowledge, while everyone else, from weak Bayesians to Sunday Bayesians, crypto-frequentists, and ardent frequentists, say it's a weakness of Bayesian reasoning. Not being a statistician, I don't need to take a view, although I incline against arguments deducing impossibility from ignorance.

The learning version

What are you contrasting with learning? I don't see a demarcation of principle between learning and any other sort of statistical prediction.

Comment author: anon85 04 June 2015 04:42:35PM 0 points [-]

I suppose the Bayesian answer to that is that a probability distribution is a description of one's knowledge, and that in principle, every state of knowledge, including total ignorance, can be represented as a prior distribution. In practice, one may not know how to do that. Fundamentalist Bayesians say that that is a weakness in our knowledge, while everyone else, from weak Bayesians to Sunday Bayesians, crypto-frequentists, and ardent frequentists, say it's a weakness of Bayesian reasoning. Not being a statistician, I don't need to take a view, although I incline against arguments deducing impossibility from ignorance.

I don't have any strong disagreements there. But consider: if we can learn well even without assuming any distribution or prior, isn't that worth exploring? The fact that there is an alternative to Bayesianism - one that we can prove works (in some well-defined settings), and isn't just naive frequentism - is pretty fascinating, isn't it?

What are you contrasting with learning?

I'm contrasting randomized vs. deterministic algorithms, which Eliezer discussed in your linked article, with Bayesian vs. PAC learning models. The randomized vs. deterministic question shouldn't really be considered learning, unless you want to call things like primality testing "learning".