eli_sennesh comments on The Truth About Mathematical Ability - LessWrong

61 Post author: JonahSinick 12 February 2015 01:29AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (138)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 14 February 2015 03:06:11PM 6 points [-]

There is a lot of rich math to be discovered outside of the areas that pure mathematicians have focused on historically, and that people might find equally fascinating. In particular, I believe this to be true within the broad domain of machine learning.

That's largely because machine learning is in its infancy. It is still a field largely defined by three very limited approaches:

  • Structural Risk Minimization (support-vector machines and other approaches that use regularization to work on high-dimensional data) -- still ultimately a kind of PAC learning, and still largely making very unstructured predictions based on very unstructured data
  • PAC learning -- even when we allow ourselves inefficient (ie: super-poly-time) PAC learning, we're still ultimately kept stuck by the reliance on prior knowledge to generate a hypothesis class with a known, finite VC Dimension. I've sometimes idly pondered trying to leverage algorithmic information theory to do something like what Hutter did, and prove a fully general counter-theorem to No Free Lunch saying that when the learner can have "more information" and "more algorithmic information" (more compute-power) than the environment, the learner can then win. (On the other hand, I tend to idly ponder a lot about AIT, since it seems to be a very underappreciated field of theoretical CS that remains underappreciated because of just how much mathematical background it requires!)
  • Stochastic Gradient Descent, and most especially neural networks: useful in properly general environments, but doesn't tell the learner's programmer much anything that makes a human kind of sense. Often overfits or finds non-global minima.

To those we are rapidly adding a fourth approach, that I think has the potential to really supplant many of the others:

  • Probabilistic programming: fully general, more capable of giving "sensible" outputs, capable of expressing arbitrary statistical models... but really slow, and modulo an Occam's Razor assumption, subject to the same sort of losses in adversarial environments as any other Bayesian methods. But a lot better than what was there before.